-
Notifications
You must be signed in to change notification settings - Fork 0
18630: Remove FilterExec from CoalesceBatches optimization rule #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -117,10 +117,9 @@ physical_plan | |
| 03)----PlaceholderRowExec | ||
| 04)--CoalescePartitionsExec | ||
| 05)----ProjectionExec: expr=[id@0 + 1 as id] | ||
| 06)------CoalesceBatchesExec: target_batch_size=8192 | ||
| 07)--------FilterExec: id@0 < 10 | ||
| 08)----------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
| 09)------------WorkTableExec: name=nodes | ||
| 06)------FilterExec: id@0 < 10 | ||
| 07)--------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
| 08)----------WorkTableExec: name=nodes | ||
|
|
||
|
Comment on lines
+120
to
123
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Refresh batch-size commentary for recursive CTE plans. These explain outputs used to show You could, for example, rewrite it to something like: -# use explain to ensure that batch size is set to 2. This should produce multiple batches per iteration since the input
-# table 'balances' has 4 rows
+# With batch size set to 2 we still produce multiple batches per iteration; the plan no longer prints a Coalesce node, so we
+# rely on the configuration rather than the explain output to confirm the setting.Also applies to: 164-167
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct that the comments in the snapshot tests need to be updated to reflect that FilterExec is no more wrapped inside CoalesceBatchesExec |
||
| # setup | ||
| statement ok | ||
|
|
@@ -162,10 +161,9 @@ physical_plan | |
| 03)----DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/recursive_cte/balance.csv]]}, projection=[time, name, account_balance], file_type=csv, has_header=true | ||
| 04)----CoalescePartitionsExec | ||
| 05)------ProjectionExec: expr=[time@0 + 1 as time, name@1 as name, account_balance@2 + 10 as account_balance] | ||
| 06)--------CoalesceBatchesExec: target_batch_size=2 | ||
| 07)----------FilterExec: time@0 < 10 | ||
| 08)------------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
| 09)--------------WorkTableExec: name=balances | ||
| 06)--------FilterExec: time@0 < 10 | ||
| 07)----------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
| 08)------------WorkTableExec: name=balances | ||
|
|
||
| # recursive CTE with static term derived from table works | ||
| # note that this is run with batch size set to 2. This should produce multiple batches per iteration since the input | ||
|
|
@@ -734,12 +732,11 @@ physical_plan | |
| 04)--ProjectionExec: expr=[2 as val] | ||
| 05)----CrossJoinExec | ||
| 06)------CoalescePartitionsExec | ||
| 07)--------CoalesceBatchesExec: target_batch_size=8182 | ||
| 08)----------FilterExec: val@0 < 2 | ||
| 09)------------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
| 10)--------------WorkTableExec: name=recursive_cte | ||
| 11)------ProjectionExec: expr=[2 as val] | ||
| 12)--------PlaceholderRowExec | ||
| 07)--------FilterExec: val@0 < 2 | ||
| 08)----------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
| 09)------------WorkTableExec: name=recursive_cte | ||
| 10)------ProjectionExec: expr=[2 as val] | ||
| 11)--------PlaceholderRowExec | ||
|
|
||
| # Test issue: https://github.com/apache/datafusion/issues/9794 | ||
| # Non-recursive term and recursive term have different types | ||
|
|
@@ -964,10 +961,9 @@ physical_plan | |
| 03)----PlaceholderRowExec | ||
| 04)--CoalescePartitionsExec | ||
| 05)----ProjectionExec: expr=[n@0 + 1 as numbers.n + Int64(1)] | ||
| 06)------CoalesceBatchesExec: target_batch_size=8182 | ||
| 07)--------FilterExec: n@0 < 10 | ||
| 08)----------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
| 09)------------WorkTableExec: name=numbers | ||
| 06)------FilterExec: n@0 < 10 | ||
| 07)--------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
| 08)----------WorkTableExec: name=numbers | ||
|
|
||
| query TT | ||
| explain WITH RECURSIVE numbers AS ( | ||
|
|
@@ -990,10 +986,9 @@ physical_plan | |
| 03)----PlaceholderRowExec | ||
| 04)--CoalescePartitionsExec | ||
| 05)----ProjectionExec: expr=[n@0 + 1 as numbers.n + Int64(1)] | ||
| 06)------CoalesceBatchesExec: target_batch_size=8182 | ||
| 07)--------FilterExec: n@0 < 10 | ||
| 08)----------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
| 09)------------WorkTableExec: name=numbers | ||
| 06)------FilterExec: n@0 < 10 | ||
| 07)--------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
| 08)----------WorkTableExec: name=numbers | ||
|
|
||
| # Test for issue #16998: SortExec shares DynamicFilterPhysicalExpr across multiple executions | ||
| query II | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc comment above still mentions handling “small batches that are produced by highly selective filters,” but the implementation no longer wraps
FilterExec. Consider updating that comment to reflect the current behavior (HashJoin/Repartition and AsyncFuncExec) to keep docs accurate.🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:good-but-wont-fix; category:documentation; feedback:The Augment AI reviewer is correct that the comment mentions "filter" but the result could be filtered not only by FilterExec. HashJoinExec also filters the results by returning only those matching the join condition.