Skip to content

[GLUTEN-11514][VL] Refactor plan execution by adding addIteratorSplits and noMoreSplits methods to the plan execution API#11527

Merged
zhztheplayer merged 6 commits intoapache:mainfrom
zhztheplayer:wip-addsplit-api
Feb 4, 2026
Merged

[GLUTEN-11514][VL] Refactor plan execution by adding addIteratorSplits and noMoreSplits methods to the plan execution API#11527
zhztheplayer merged 6 commits intoapache:mainfrom
zhztheplayer:wip-addsplit-api

Conversation

@zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Jan 30, 2026

Split plan execution via NativePlanEvaluator / ColumnarBatchOutIterator into 3 APIs:

  1. NativePlanEvaluator#create
  2. ColumnarBatchOutIterator#addIteratorSplits
  3. ColumnarBatchOutIterator#noMoreSplits

This allows iterator-based splits to be added later than when the task is started. Useful when NativePlanEvaluator is used individually, out of the transformer code routine.

This is a prerequisite for #11514 and #11419.

Tested by existing tests and #11419

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

2 similar comments
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

namespace gluten {

class CudfVectorStream : public RowVectorStream {
class CudfVectorStreamBase {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jinchengchenghh

RowVectorStream's code is being inlined into CudfVectorStreamBase here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Run Gluten Clickhouse CI on x86

@zhztheplayer zhztheplayer changed the title [GLUTEN-11514][VL] Add addIteratorSplits and noMoreSplits methods to the plan execution API [GLUTEN-11514][VL] Refactor plan execution by adding addIteratorSplits and noMoreSplits methods to the plan execution API Feb 3, 2026
@zhztheplayer zhztheplayer marked this pull request as ready for review February 3, 2026 12:00
Copy link
Member

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I understand the basic logic now - just one question on the concurrent execution threading model

Cc: @rui-mo

}
}

void WholeStageResultIterator::noMoreSplits() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work if calling from different threads? e.g. thread A called addIteratorSplits(), but thread B called on noMoreSplits()

Copy link
Member Author

@zhztheplayer zhztheplayer Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not guaranteed thread-safe. In the Delta PR, the plan is still created and executed in one single thread so no requirement for thread safety on this API now.

void WholeStageResultIterator::tryAddSplitsToTask() {
if (noMoreSplits_) {
void WholeStageResultIterator::addIteratorSplits(const std::vector<std::shared_ptr<ResultIterator>>& inputIterators) {
// Create IteratorConnectorSplit for each iterator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also guard for allSplitsAdded = true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea.

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Run Gluten Clickhouse CI on x86

@zhztheplayer
Copy link
Member Author

@zhouyuan

Thanks! I understand the basic logic now - just one question on the concurrent execution threading model

The change is not supposed to support a concurrent execution model - it's still single-threaded execution. The goal was to add the barrier API, so we can reuse the Velox task in the same thread but for newer inputs.

Let me know any remaining issues. cc @rui-mo

Copy link
Member

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@zhztheplayer zhztheplayer merged commit 420b704 into apache:main Feb 4, 2026
59 checks passed
marin-ma added a commit to marin-ma/gluten that referenced this pull request Feb 5, 2026
…atorSplits` and `noMoreSplits` methods to the plan execution API (apache#11527)"

This reverts commit 420b704.
zhztheplayer added a commit to zhztheplayer/gluten that referenced this pull request Feb 5, 2026
…oreSplits on cuDF value stream nodes

This fixes apache#11569 with a code style fix in passing.
marin-ma pushed a commit to marin-ma/gluten that referenced this pull request Feb 6, 2026
…oreSplits on cuDF value stream nodes

This fixes apache#11569 with a code style fix in passing.
marin-ma pushed a commit that referenced this pull request Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants