[CORE] Refactor task error logging into dedicated utility class by pratham76 · Pull Request #11718 · apache/gluten

pratham76 · 2026-03-08T07:12:19Z

What changes are proposed in this pull request?

Extract task error logging from TaskResources into new TaskErrorLogger utility, resolving a TODO here. Improves separation of concerns while maintaining identical functionality for filtering speculative execution errors.

How was this patch tested?

Existing UTs

Was this patch authored or co-authored using generative AI tooling?

No

github-actions · 2026-03-08T07:12:49Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-03-08T07:27:34Z

Run Gluten Clickhouse CI on x86

pratham76 · 2026-03-08T10:24:27Z

@zhztheplayer Could you please review?

zhztheplayer

Hi @pratham76, thanks for helping out.

The intention of the TODO could mean we need to rely on Spark's scheduler layer to log the error. But we had to keep the TaskFailureListener here because Gluten may cause crash in CompletionListener so Spark's scheduler code will never have a chance to log the error.

So I think the discussion here is not about how to log the error. We may need to consider open PR in upstream Spark to make logging work even CompletionListener crashed.

I might not recall everything so let me know if you have different findings here.

zhztheplayer

Apart from the above comment, PR change itself LGTM.

github-actions · 2026-03-10T02:09:59Z

Run Gluten Clickhouse CI on x86

pratham76 · 2026-03-10T02:10:00Z

Hi @zhztheplayer, Thanks for the clarification, I do get that the proper solution for the above mentioned TODO would be make spark's error logging more resilient when CompletionListener crashes.

In this PR, I have tried to refactor the logging logic into separate utility class, to handle both scenarios of task failures and task recompute/retries, just to make the code a bit cleaner. I'm happy to explore the upstream Spark changes for this TODO (which I have updated here to give better context). For now, could we proceed with this refactoring as an incremental improvement?

github-actions · 2026-03-10T02:49:19Z

Run Gluten Clickhouse CI on x86

zhztheplayer · 2026-03-10T09:20:48Z

Hi @zhztheplayer, Thanks for the clarification, I do get that the proper solution for the above mentioned TODO would be make spark's error logging more resilient when CompletionListener crashes.

In this PR, I have tried to refactor the logging logic into separate utility class, to handle both scenarios of task failures and task recompute/retries, just to make the code a bit cleaner. I'm happy to explore the upstream Spark changes for this TODO (which I have updated here to give better context). For now, could we proceed with this refactoring as an incremental improvement?

Sounds reasonable. Thanks for the patch.

github-actions bot added the CORE works for Gluten Core label Mar 8, 2026

Refactor task error logging into dedicated utility class

2764e11

pratham76 force-pushed the pm-refactor-logging branch from af1dfe8 to 2764e11 Compare March 8, 2026 07:27

zhztheplayer reviewed Mar 9, 2026

View reviewed changes

zhztheplayer approved these changes Mar 9, 2026

View reviewed changes

update TODO

9ef29f5

pratham76 force-pushed the pm-refactor-logging branch from 693fd91 to 9ef29f5 Compare March 10, 2026 02:48

zhztheplayer merged commit 205e133 into apache:main Mar 10, 2026
62 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE] Refactor task error logging into dedicated utility class#11718

[CORE] Refactor task error logging into dedicated utility class#11718
zhztheplayer merged 2 commits intoapache:mainfrom
pratham76:pm-refactor-logging

pratham76 commented Mar 8, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 8, 2026

Uh oh!

github-actions bot commented Mar 8, 2026

Uh oh!

pratham76 commented Mar 8, 2026

Uh oh!

zhztheplayer left a comment

Uh oh!

zhztheplayer left a comment

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

pratham76 commented Mar 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

zhztheplayer commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pratham76 commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Mar 8, 2026

Uh oh!

github-actions bot commented Mar 8, 2026

Uh oh!

pratham76 commented Mar 8, 2026

Uh oh!

zhztheplayer left a comment

Choose a reason for hiding this comment

Uh oh!

zhztheplayer left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

pratham76 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

zhztheplayer commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pratham76 commented Mar 8, 2026 •

edited

Loading

pratham76 commented Mar 10, 2026 •

edited

Loading