18630: Remove FilterExec from CoalesceBatches optimization rule#15
18630: Remove FilterExec from CoalesceBatches optimization rule#15martin-augment wants to merge 3 commits intomainfrom
Conversation
WalkthroughThis pull request removes ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review: Remove FilterExec from CoalesceBatches optimization ruleSummaryThis PR removes FilterExec from the list of execution nodes that are automatically wrapped with CoalesceBatchesExec in the physical optimizer. Positive Aspects
Review FindingsCode Quality: Good
Potential Issues: Minor Concern
Performance: Excellent
Security: None identified Test Coverage: Excellent
Recommendations
ConclusionRecommendation: Approve with minor documentation improvements This is a solid optimization that removes redundant functionality. The implementation is correct, well-tested, and should improve performance by eliminating double batch coalescing. Impact: Low risk, positive performance impact, improved plan readability. |
| // See https://github.com/apache/datafusion/issues/139 | ||
| let wrap_in_coalesce = plan_any.downcast_ref::<FilterExec>().is_some() | ||
| || plan_any.downcast_ref::<HashJoinExec>().is_some() | ||
| let wrap_in_coalesce = plan_any.downcast_ref::<HashJoinExec>().is_some() |
There was a problem hiding this comment.
The doc comment above still mentions handling “small batches that are produced by highly selective filters,” but the implementation no longer wraps FilterExec. Consider updating that comment to reflect the current behavior (HashJoin/Repartition and AsyncFuncExec) to keep docs accurate.
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:good-but-wont-fix; category:documentation; feedback:The Augment AI reviewer is correct that the comment mentions "filter" but the result could be filtered not only by FilterExec. HashJoinExec also filters the results by returning only those matching the join condition.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
datafusion/physical-optimizer/src/coalesce_batches.rs (1)
35-37: Update the documentation to reflect the removal of FilterExec.The comment mentions "highly selective filters" as a motivation for this optimizer rule, but the rule no longer wraps
FilterExecoperators. Consider updating this comment to reflect the current scope of the optimizer.Apply this diff to update the documentation:
-/// Optimizer rule that introduces CoalesceBatchesExec to avoid overhead with small batches that -/// are produced by highly selective filters +/// Optimizer rule that introduces CoalesceBatchesExec to avoid overhead with small batchesdatafusion/core/tests/dataframe/mod.rs (1)
769-771: Reduce snapshot brittleness around plan shapes (optional)Consider asserting for the presence/ordering of key nodes (AggregateExec → FilterExec → DataSourceExec) instead of full multiline snapshots to lessen churn when display wrappers change.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (58)
datafusion/core/tests/dataframe/mod.rs(5 hunks)datafusion/core/tests/sql/explain_analyze.rs(2 hunks)datafusion/physical-optimizer/src/coalesce_batches.rs(2 hunks)datafusion/sqllogictest/test_files/aggregate.slt(2 hunks)datafusion/sqllogictest/test_files/array.slt(6 hunks)datafusion/sqllogictest/test_files/async_udf.slt(1 hunks)datafusion/sqllogictest/test_files/count_star_rule.slt(1 hunks)datafusion/sqllogictest/test_files/cte.slt(5 hunks)datafusion/sqllogictest/test_files/dictionary.slt(3 hunks)datafusion/sqllogictest/test_files/explain.slt(1 hunks)datafusion/sqllogictest/test_files/explain_tree.slt(17 hunks)datafusion/sqllogictest/test_files/filter_without_sort_exec.slt(6 hunks)datafusion/sqllogictest/test_files/join.slt.part(3 hunks)datafusion/sqllogictest/test_files/join_disable_repartition_joins.slt(1 hunks)datafusion/sqllogictest/test_files/join_is_not_distinct_from.slt(1 hunks)datafusion/sqllogictest/test_files/joins.slt(8 hunks)datafusion/sqllogictest/test_files/limit.slt(1 hunks)datafusion/sqllogictest/test_files/map.slt(1 hunks)datafusion/sqllogictest/test_files/operator.slt(2 hunks)datafusion/sqllogictest/test_files/options.slt(2 hunks)datafusion/sqllogictest/test_files/parquet.slt(4 hunks)datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt(6 hunks)datafusion/sqllogictest/test_files/parquet_statistics.slt(3 hunks)datafusion/sqllogictest/test_files/predicates.slt(5 hunks)datafusion/sqllogictest/test_files/projection.slt(1 hunks)datafusion/sqllogictest/test_files/push_down_filter.slt(6 hunks)datafusion/sqllogictest/test_files/pwmj.slt(5 hunks)datafusion/sqllogictest/test_files/qualify.slt(4 hunks)datafusion/sqllogictest/test_files/regexp/regexp_like.slt(1 hunks)datafusion/sqllogictest/test_files/repartition.slt(1 hunks)datafusion/sqllogictest/test_files/repartition_scan.slt(6 hunks)datafusion/sqllogictest/test_files/select.slt(6 hunks)datafusion/sqllogictest/test_files/simplify_expr.slt(4 hunks)datafusion/sqllogictest/test_files/subquery.slt(2 hunks)datafusion/sqllogictest/test_files/tpch/plans/q1.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q10.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q11.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q12.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q13.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q14.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q15.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q16.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q17.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q18.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q19.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q2.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q20.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q21.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q22.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q3.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q4.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q5.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q6.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q7.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q8.slt.part(1 hunks)datafusion/sqllogictest/test_files/tpch/plans/q9.slt.part(1 hunks)datafusion/sqllogictest/test_files/union.slt(2 hunks)datafusion/sqllogictest/test_files/window.slt(13 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Cursor Bugbot
- GitHub Check: claude-review
- GitHub Check: Analyze (rust)
🔇 Additional comments (79)
datafusion/sqllogictest/test_files/async_udf.slt (1)
83-87: Test expectation update looks correct.The updated physical plan correctly reflects the removal of
CoalesceBatchesExecwrapper aroundFilterExecas per the PR objective. The plan now starts withFilterExecat the top level, simplifying the execution hierarchy while preserving the filter logic and downstream operations. TheCoalesceBatchesExecnode is appropriately retained deeper in the plan (afterAsyncFuncExec), where it still serves a purpose.datafusion/sqllogictest/test_files/tpch/plans/q3.slt.part (1)
16-26: LGTM! Test expectations correctly updated to reflect the optimization change.The physical plan expectations have been properly updated to show
FilterExecoperations appearing directly withoutCoalesceBatchesExecwrappers. The plan structure remains logically sound:
- Filters on customer (line 16), orders (line 21), and lineitem (line 25) now execute directly before their respective data sources
CoalesceBatchesExecis appropriately retained around hash join operations (lines 19, 23) where batch coalescing provides value- The overall query semantics and execution flow are preserved
datafusion/sqllogictest/test_files/tpch/plans/q10.slt.part (1)
23-31: Test expectations correctly updated to reflect the optimization change.The physical plan now shows
FilterExecoperators appearing directly withoutCoalesceBatchesExecwrappers (lines 23 and 27), which aligns with the PR objective of removing FilterExec from the CoalesceBatches optimization rule. The plan structure remains logically sound:
- Filters are still properly applied after
DataSourceExec(pushdown optimization)CoalesceBatchesExeccontinues to wrapHashJoinExecoperations (lines 76, 79-80, 83, etc.)- All predicates, projections, and join operations are preserved
datafusion/sqllogictest/test_files/tpch/plans/q13.slt.part (1)
72-73: Test expectation correctly updated to reflect optimizer change.The physical plan now shows
FilterExecdirectly without theCoalesceBatchesExecwrapper, which aligns with the PR objective to removeFilterExecfrom the CoalesceBatches optimization rule. The underlying filter logic, projections, and data source configuration remain functionally equivalent.datafusion/sqllogictest/test_files/tpch/plans/q21.slt.part (1)
121-138: LGTM! Test expectations correctly updated to reflect optimizer changes.The physical plan expectations have been properly updated to show
FilterExecoperations without theCoalesceBatchesExecwrapper, which aligns with the PR objective of removingFilterExecfrom the CoalesceBatches optimization rule. All fourFilterExecinstances (lines 121, 125, 129, 137) are consistently updated, whileCoalesceBatchesExecis correctly retained aroundHashJoinExecoperations throughout the plan.datafusion/sqllogictest/test_files/tpch/plans/q16.slt.part (1)
23-30: LGTM! Test expectations correctly reflect the optimizer change.The updated physical plan correctly shows
FilterExecoperators appearing directly in the execution tree withoutCoalesceBatchesExecwrappers, which aligns with the PR objective of removing FilterExec from the CoalesceBatches optimization rule. The plan structure remains logically equivalent, with all filter predicates and execution operators properly preserved.datafusion/sqllogictest/test_files/regexp/regexp_like.slt (1)
254-255: Test expectation correctly updated to reflect optimizer change.The physical plan now shows
FilterExecdirectly without theCoalesceBatchesExecwrapper, which correctly reflects the PR's objective of removingFilterExecfrom theCoalesceBatchesoptimization rule. The test logic and dictionary optimization behavior remain unchanged.datafusion/sqllogictest/test_files/tpch/plans/q5.slt.part (2)
100-102: LGTM! FilterExec now appears without CoalesceBatchesExec wrapper.This change correctly implements the PR objective by removing the
CoalesceBatchesExecwrapper around theFilterExecfor the orders date range filter. The plan structure remains valid with the filter applied directly after repartitioning and before the data source.
103-115: LGTM! Consistent FilterExec optimization applied across the plan.The changes correctly remove
CoalesceBatchesExecwrappers around the region filter (line 113), while preserving coalescing aroundHashJoinExecoperations (lines 103, 106, 109, 112). The plan structure is valid and maintains query semantics while implementing the intended optimization.datafusion/sqllogictest/test_files/operator.slt (2)
290-318: LGTM: Test expectations correctly updated for optimizer change.The physical plan expectations for "less than" operator tests have been consistently updated to reflect the removal of
CoalesceBatchesExecwrapper aroundFilterExec. The filter predicates, type casting behavior, and data source details remain unchanged across all four test variants (positive integer, negative integer, decimal, negative decimal).
329-357: LGTM: Test expectations correctly updated for equality operator tests.The physical plan expectations for equality operator tests have been consistently updated to reflect the removal of
CoalesceBatchesExecwrapper aroundFilterExec. The filter predicates, type casting behavior, and data source details remain unchanged across all four test variants, maintaining symmetry with the less-than operator tests.datafusion/sqllogictest/test_files/tpch/plans/q20.slt.part (1)
97-122: Plan expectation matches FilterExec unwrappingThe updated plan shows
FilterExecdirectly after the repartition without a surroundingCoalesceBatchesExec, which is exactly what we expect after removingFilterExecfrom the coalesce optimization. The downstream operator sequence and projections remain consistent with the prior logical flow. Looks good.datafusion/sqllogictest/test_files/tpch/plans/q7.slt.part (1)
32-49: Filter placement matches the optimizer changeNice to see the
FilterExecnodes now sitting directly atop theirDataSourceExecinputs with the surroundingCoalesceBatchesExecwrappers gone—exactly what we expect after removingFilterExecfrom the coalesce rule, and the repartition topology stays intact. Looks good.datafusion/sqllogictest/test_files/push_down_filter.slt (1)
43-44: Reject original review comment — premise is incorrect.The original review incorrectly assumes FilterExec was removed from the CoalesceBatches optimization. In fact, the
coalesce_batches.rsimplementation only wrapsHashJoinExec, certainRepartitionExecoperators, andAsyncFuncExec—FilterExec is never included in the wrapping logic. FilterExec was never part of this optimization and therefore nothing was "removed."The
.slttest changes are simply updating expected execution plans to match the existing optimizer behavior. No underlying code change occurred to justify the premise that FilterExec exclusion was implemented.Likely an incorrect or invalid review comment.
datafusion/sqllogictest/test_files/tpch/plans/q9.slt.part (1)
106-123: Test expectation correctly reflects the optimization change.The physical plan update appropriately removes the
CoalesceBatchesExecwrapper from aroundFilterExec(line 106), aligning with the PR objective to exclude FilterExec from the CoalesceBatches optimization rule. TheFilterExecnow appears directly in the plan, whileCoalesceBatchesExecnodes remain strategically placed before hash joins and repartitioning operations (lines 104, 109, 112, 115, 118, 121). The logical plan is unchanged, confirming this is purely an execution plan optimization.To ensure this optimization doesn't introduce performance regressions for queries with selective filters, consider running TPC-H benchmarks comparing query execution times before and after this change, particularly for queries with FilterExec operators on large tables.
datafusion/sqllogictest/test_files/qualify.slt (5)
276-278: LGTM: FilterExec wrapper removed as intended.The physical plan correctly shows
FilterExecappearing directly without aCoalesceBatchesExecwrapper, consistent with the PR objective. The execution flow (SortExec → FilterExec → WindowAggExec → DataSourceExec) remains logically sound.
292-296: LGTM: Simplified execution plan for partitioned window function.The removal of the
CoalesceBatchesExecwrapper aroundFilterExec(line 292) aligns with the PR objective. The plan correctly usesBoundedWindowAggExecfor the partitionedrow_number()function.
319-332: LGTM: Selective removal of CoalesceBatchesExec demonstrates correct optimization.The plan correctly shows:
FilterExecoperators at lines 320, 326, and 331 withoutCoalesceBatchesExecwrappers (aligned with PR objective)CoalesceBatchesExecretained at lines 323 and 327, wrapping hash-partitionedRepartitionExecoperations (correct behavior per the AI summary)This demonstrates the targeted nature of the optimization change.
353-362: LGTM: Consistent FilterExec unwrapping in aggregate query plan.The
FilterExecat line 354 correctly appears without aCoalesceBatchesExecwrapper, whileCoalesceBatchesExecis appropriately retained at line 359 for hash-partitioned aggregation. The plan structure properly handles the aggregate-then-filter-then-sort execution flow.
276-362: Manual verification required—sandbox cannot execute tests.The sandbox environment lacks Rust compilation and runtime capabilities, so I cannot execute
cargo testto verify the QUALIFY test suite. Since this is a test expectations file with query plan snapshots, runtime verification is essential to confirm:
- All QUALIFY queries still produce correct result rows (not just plan structure matches)
- Performance hasn't regressed with the optimization changes
Please manually run:
cargo test --test sqllogictest -- qualify.sltto ensure all test assertions pass before merging.
datafusion/sqllogictest/test_files/tpch/plans/q8.slt.part (3)
129-131: Test expectations updated correctly for part table filtering.The FilterExec operator now appears directly without a CoalesceBatchesExec wrapper, consistent with the PR objective. The filter predicate and projection remain correct.
138-141: Test expectations updated correctly for orders table filtering.The FilterExec operator for the date range filter now appears without a CoalesceBatchesExec wrapper. The filter predicate correctly validates orders between 1995-01-01 and 1996-12-31.
151-155: Region table FilterExec placement is correct.The filter predicate (
r_name@1 = AMERICA) and projection (r_regionkey@0) are accurate. The execution plan structure with nested RepartitionExec operators is properly ordered.datafusion/sqllogictest/test_files/count_star_rule.slt (1)
67-72: The review comment appears to be incorrect. FilterExec was never wrapped by the CoalesceBatches optimizer rule.Examining the actual optimizer implementation, the
wrap_in_coalescecondition only wraps three types of operators:
HashJoinExecRepartitionExec(excluding RoundRobinBatch)- The input to
AsyncFuncExec
FilterExecis not in this list and has never been wrapped byCoalesceBatchesExec. The physical plan in the test showsCoalesceBatchesExeccorrectly wrapping theRepartitionExecportion downstream of theFilterExec, which is the expected and documented behavior.The test update reflects the normal physical plan structure for a HAVING clause query—it is not evidence of removing any optimization. Without documentation that this is an intentional restructuring of how aggregates with HAVING clauses are planned, this appears to be a routine test expectation update unrelated to FilterExec optimization.
Likely an incorrect or invalid review comment.
datafusion/sqllogictest/test_files/limit.slt (1)
380-381: Test output correctly reflects the targeted optimization removing CoalesceBatchesExec around FilterExec following GlobalLimitExec.The updated physical plan at lines 380-381 shows FilterExec appearing directly under GlobalLimitExec without the CoalesceBatchesExec wrapper, which aligns with the PR objective. Query result remains correct (still returns 1). The change is selective—CoalesceBatchesExec still wraps FilterExec in other optimization contexts elsewhere in the test suite (e.g., subquery.slt), confirming this is a targeted optimization rather than a wholesale removal.
datafusion/sqllogictest/test_files/repartition_scan.slt (2)
62-63: Test expectation correctly updated.The physical plan now shows
FilterExecdirectly followed byDataSourceExecwithout theCoalesceBatchesExecwrapper, correctly reflecting the optimization rule change.
62-63: The optimizer code changes are complete and verified.The
CoalesceBatchesoptimizer rule atdatafusion/physical-optimizer/src/coalesce_batches.rshas been correctly updated. Thewrap_in_coalescecondition (lines 63-66) now only wrapsHashJoinExecand non-RoundRobinRepartitionExecoperators.FilterExecis completely absent from the wrapping logic, which aligns with all test expectations across Parquet, CSV, and JSON file formats in the test file.datafusion/sqllogictest/test_files/tpch/plans/q2.slt.part (3)
39-41: Second FilterExec also without CoalesceBatchesExec wrapper.The filter on
r_nameconsistently follows the same pattern as the first FilterExec, flowing directly to repartitioning. This demonstrates that the optimization rule change is being applied uniformly across all FilterExec operators in the plan.
28-72: CoalesceBatchesExec correctly preserved around HashJoinExec operations.The optimization change selectively removes
CoalesceBatchesExecaroundFilterExecwhile correctly preserving it aroundHashJoinExecoperations (lines 28, 31, 34, 37, 42, 46, etc.). This is the expected behavior since joins can produce irregular batch sizes that benefit from coalescing, while filters typically don't alter batch structure significantly.
25-27: FilterExec operators now appear without CoalesceBatchesExec wrapper—verified in physical plan.The file inspection confirms the changes are accurate. FilterExec operators at lines 25, 39, and 70 are no longer directly wrapped by CoalesceBatchesExec, while CoalesceBatchesExec remains around HashJoinExec nodes throughout the plan. This aligns with the PR objective.
Since automated test execution is not available in this environment, please manually verify that the SQL logic tests pass with these updated expectations to ensure query correctness is maintained.
datafusion/sqllogictest/test_files/tpch/plans/q17.slt.part (1)
67-75: Test expectation file correctly updated—no issues found.The physical plan shows
FilterExecis no longer directly wrapped inCoalesceBatchesExec. Batch coalescing is properly applied upstream at theRepartitionExec(line 12) and continues to wrap join operations and other repartitioning steps, which is the expected behavior from removing this specific optimization case.datafusion/sqllogictest/test_files/tpch/plans/q22.slt.part (3)
93-95: Test expectation correctly updated to reflect CoalesceBatchesExec removal.The FilterExec now appears directly under RepartitionExec without a CoalesceBatchesExec wrapper, which aligns with the PR objective. The change is consistent and the test expectation is accurate.
Note: Removing batch coalescing after filter operations may result in downstream operators receiving smaller batches, which could impact throughput in some scenarios. However, this appears to be an intentional design decision to simplify the execution plan.
99-104: Consistent removal of CoalesceBatchesExec from FilterExec in aggregate subquery.The test expectation correctly shows FilterExec (line 102) appearing directly in the aggregation pipeline without a CoalesceBatchesExec wrapper. This change is consistent with the earlier FilterExec update (line 93) and maintains the overall query semantics while simplifying the execution plan structure.
77-104: Unable to fully verify the original review comment due to sandbox limitations.While I confirmed that the FilterExec patterns are consistent across multiple TPC-H test files (q1, q15, q18, q22 all show FilterExec NOT wrapped by CoalesceBatchesExec), I cannot definitively determine:
- PR scope: Whether only q22.slt.part was modified or if all affected TPC-H test files were updated in this PR (git history not accessible in sandbox)
- Implementation verification: Whether the optimizer code change actually exists in the codebase (would require deep code inspection beyond sandbox capabilities)
- Test coverage: Whether unit tests exist for this optimization change
- Performance impact: Whether benchmarks have been run
The consistent pattern across sampled files suggests the optimization may be comprehensively applied, but this cannot be confirmed without access to git history and the implementation code.
Recommended manual checks:
- Review the full git diff to confirm all intended test files were updated
- Verify the optimizer implementation exists and is correctly gating this behavior
- Confirm unit tests cover the CoalesceBatchesExec removal logic
- Review performance benchmark results
datafusion/sqllogictest/test_files/pwmj.slt (1)
90-94: Filter placements look correct post-rule change.The updated snapshots accurately reflect FilterExec sitting directly on the scans after dropping the CoalesceBatches wrapper. This aligns with the optimizer adjustment, and the expected plans stay coherent across all cases.
Also applies to: 135-139, 182-183, 229-230, 272-275
datafusion/sqllogictest/test_files/explain_tree.slt (1)
169-188: Filter plan updates match the intended optimizer change.Nice work updating the tree explain expectations so the simple filter now shows
FilterExecdirectly on top ofRepartitionExec/DataSourceExec, reflecting the removal of theCoalesceBatchesExecwrapper. No issues spotted here.datafusion/physical-optimizer/src/coalesce_batches.rs (1)
63-63: LGTM! FilterExec removed from coalescing logic.The change correctly removes
FilterExecfrom the list of operators that get wrapped withCoalesceBatchesExec, leaving onlyHashJoinExec(and special handling forRepartitionExecandAsyncFuncExec).datafusion/sqllogictest/test_files/tpch/plans/q6.slt.part (1)
41-42: LGTM! Test expectations updated correctly.The physical plan correctly reflects the removal of
CoalesceBatchesExecwrapping aroundFilterExec. The plan now showsFilterExecdirectly followed byDataSourceExec, which is consistent with the optimizer change.datafusion/sqllogictest/test_files/dictionary.slt (1)
413-414: LGTM! Test expectations updated correctly.Physical plans throughout this file have been updated to remove
CoalesceBatchesExecwrappers aroundFilterExec, showing the simplified plan structure consistently.datafusion/sqllogictest/test_files/map.slt (1)
116-117: LGTM! Test expectations updated correctly.The physical plan correctly shows
FilterExecfollowed byDataSourceExecwithout the intermediateCoalesceBatchesExecnode.datafusion/sqllogictest/test_files/simplify_expr.slt (1)
29-30: LGTM! Test expectations updated correctly.All physical plans in this file have been consistently updated to reflect the removal of
CoalesceBatchesExecwrappers aroundFilterExec.datafusion/sqllogictest/test_files/tpch/plans/q1.slt.part (1)
58-59: LGTM! Test expectations updated correctly.The physical plan correctly shows
FilterExecwith its projection directly followed byDataSourceExec, removing theCoalesceBatchesExecwrapper.datafusion/sqllogictest/test_files/filter_without_sort_exec.slt (1)
41-43: LGTM! Test expectations updated correctly.All physical plans in this file have been consistently updated to show
FilterExecdirectly in the plan withoutCoalesceBatchesExecwrappers, maintaining the correct execution semantics.datafusion/sqllogictest/test_files/tpch/plans/q18.slt.part (1)
93-98: LGTM! Test expectations updated correctly.The physical plan for this complex query (with subqueries and multiple joins) has been correctly updated to reflect the removal of
CoalesceBatchesExecaroundFilterExec. The execution semantics remain unchanged.datafusion/sqllogictest/test_files/select.slt (1)
1443-1445: LGTM! Test expectations updated for simplified physical plans.The explain plan outputs have been consistently updated across multiple queries to reflect the removal of
CoalesceBatchesExecwrappers aroundFilterExec. The new plans showFilterExecdirectly followed byRepartitionExecandDataSourceExec, which aligns with the PR objective to limit batching coalescing toHashJoinExecoperations only.Also applies to: 1463-1465, 1483-1485, 1503-1505, 1524-1526, 1571-1573
datafusion/sqllogictest/test_files/tpch/plans/q12.slt.part (1)
12-16: LGTM! TPC-H Q12 plan correctly updated.The physical plan has been updated to remove
CoalesceBatchesExecwrapping around theFilterExecon the lineitem side of the hash join (lines 12-13). TheHashJoinExecitself still maintainsCoalesceBatchesExecwrappers on both inputs (lines 10-11 and 14), which is consistent with the new optimization rule that applies batching coalescing only around join operations.datafusion/sqllogictest/test_files/parquet_statistics.slt (1)
62-64: LGTM! Statistics test expectations consistently updated.The explain plan outputs have been updated consistently across all three configuration blocks (default,
collect_statistics=true, andcollect_statistics=false) to showFilterExecas the first operation in the physical plan. The statistics reporting has been appropriately adjusted to reflect the new plan structure, with column statistics showingMin=Exact(Int64(1)) Max=Exact(Int64(1))after the filter is applied.Also applies to: 87-89, 113-115
datafusion/sqllogictest/test_files/tpch/plans/q19.slt.part (1)
9-15: LGTM! TPC-H Q19 plan correctly restructured.The physical plan has been updated to remove
CoalesceBatchesExecwrapping around theFilterExecon the lineitem side of the hash join (lines 9-10). The part side maintains its structure with appropriate batching before repartitioning (lines 11-12) and filtering (line 13). TheHashJoinExecproperly retainsCoalesceBatchesExecwrapping (line 5), consistent with the optimization rule change.datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt (1)
98-100: LGTM! Parquet filter pushdown tests comprehensively updated.The explain plan outputs have been consistently updated across multiple test scenarios to reflect the removal of
CoalesceBatchesExecwrappers aroundFilterExec. Additionally, several plans now consolidate projection operations directly into theFilterExecnode (e.g.,projection=[a@0]) rather than using a separateProjectionExec, which further simplifies the physical plans. The volatile predicate test (lines 412-413) is also correctly updated to show the same pattern.Also applies to: 135-137, 266-268, 303-305, 341-343, 412-413
datafusion/sqllogictest/test_files/window.slt (7)
1888-1890: Plan update acknowledgedFilterExec now bypasses the extra CoalesceBatchesExec layer, which aligns with the optimizer change to stop wrapping filters. Looks good.
3545-3546: Plan update acknowledgedThis fragment now shows the filter immediately on the scan path, matching the new optimization behavior. Looks good.
3599-3600: Plan update acknowledgedFilterExec remains directly on the scan branch here as well, consistent with the intended optimizer tweak. Looks good.
5304-5307: Plan update acknowledgedThe hash repartition + filter ordering now reflects the removal of the CoalesceBatches wrapper around filters. Looks good.
5381-5388: Plan update acknowledgedThe plan fragment properly shows both filters outside of any CoalesceBatchesExec, matching the optimizer change. Looks good.
5420-5425: Plan update acknowledgedThis plan now keeps the disjunctive filter adjacent to the scan without extra coalescing, which is precisely what we expect. Looks good.
5517-5522: Plan update acknowledgedFilter placement relative to the repartition stages reflects the new rule removal and looks correct.
datafusion/sqllogictest/test_files/projection.slt (1)
279-281: LGTM! Test expectations correctly updated.The physical plan expectations have been appropriately updated to reflect the removal of
CoalesceBatchesExecwrapper aroundFilterExec. The execution plan structure remains logically sound with correct indentation levels, and the change aligns with the PR objective.datafusion/sqllogictest/test_files/array.slt (6)
6439-6441: Plan expectation update looks correct (FilterExec unwrapped).Physical plan now shows FilterExec directly over RepartitionExec/LazyMemoryExec, aligning with removal of CoalesceBatches around filters.
6467-6469: Consistent unwrapping of FilterExec.Same shape as above; no CoalesceBatchesExec, order preserved: Filter -> Repartition -> Source.
6495-6497: EXPLAIN output matches intended optimizer change.Direct FilterExec without CoalesceBatchesExec wrapper.
6523-6525: Good: CoalesceBatchesExec removed around FilterExec.Plan structure consistent with other updated blocks.
6551-6553: Correct plan shape post-rule change.FilterExec remains directly above Repartition/LazyMemory.
6581-6583: Verification complete: FilterExec unwrapping in scope achieved; scan confirms no problematic direct wrappings.The provided scan results across all sqllogictest files show CoalesceBatchesExec nodes, but examination of the plan hierarchy reveals no direct wrapping of FilterExec by CoalesceBatchesExec in the sampled output. The expected plan structure (lines 6581–6583 in array.slt) correctly shows FilterExec appearing at a lower nesting level without CoalesceBatchesExec as its immediate parent. The PR objective has been achieved in the verified scope.
datafusion/sqllogictest/test_files/explain.slt (1)
46-48: Plan shape update looks correctFilterExec before Repartition/DataSource with projection preserves semantics and improves early selectivity. LGTM.
datafusion/sqllogictest/test_files/repartition.slt (1)
124-127: Filter before repartition is desirableLimit remains above filter (via CoalescePartitions), and pushing FilterExec below it is correct and efficient for streaming. LGTM.
datafusion/sqllogictest/test_files/parquet.slt (4)
457-460: Predicate pushdown preserved; coalesce removedFilterExec sits above Repartition/DataSource, with predicate retained at DataSourceExec. Snapshot matches intent.
504-507: Same improvement replicatedConsistent FilterExec placement and predicate pushdown. LGTM.
554-557: Consistent plan simplificationNo CoalesceBatchesExec around FilterExec; semantics unchanged. LGTM.
668-671: Starts_with pruning case looks goodFilterExec ordering correct; pruning_predicate remains in Parquet exec. LGTM.
datafusion/sqllogictest/test_files/join_is_not_distinct_from.slt (1)
156-160: Pre-join filter ordering is correctFilterExec is applied before repartitioning inputs to the join; HashJoin remains elsewhere, maintaining coalescing where needed. LGTM.
datafusion/sqllogictest/test_files/aggregate.slt (2)
6025-6028: Limit-with-filter subtree aligns with rule changeFilterExec appears directly in the aggregation subtree; ordering with GlobalLimit remains correct. LGTM.
7167-7173: HAVING filter now explicit and projectedExplicit FilterExec on aggregate output with a precise projection is clearer and correct. LGTM.
datafusion/core/tests/dataframe/mod.rs (3)
769-771: Snapshots updated to show FilterExec without CoalesceBatchesExec: LGTMThese plan strings now place FilterExec directly under Aggregate/DataSource, consistent with removing FilterExec from the CoalesceBatches rule. Good alignment with the PR goal.
Also applies to: 817-819, 867-869
804-806: NULL/inlist plan fragments: LGTMPhysical plans now render FilterExec directly. Semantics unchanged; strings match expected display.
Also applies to: 814-816, 844-846
3329-3352: Review comment appears to be based on incorrect assumptions about rule scopeThe grep results found 30+ instances of CoalesceBatchesExec wrapping FilterExec in
replace_with_order_preserving_variants.rs, but these are from a different optimization rule than the scalar subquery changes shown in the code snapshot.The
coalesce_batches.rsrule (the one that introduces CoalesceBatchesExec) specifically wraps afterHashJoinExec,RepartitionExec, andAsyncFuncExec—not FilterExec. The wrapping of FilterExec in the order-preserving variants tests appears to come from that separate optimization rule's logic, not from the scalar subquery changes.Without evidence that the scalar subquery optimization rule change applies universally to all CoalesceBatchesExec→FilterExec patterns across the codebase, the suggested verification may be checking the wrong context. The snapshot updates in
mod.rs(3329-3352, 3384-3407) are correct for scalar subquery optimization, but the matches inreplace_with_order_preserving_variants.rsmay be intentional for that separate rule.Likely an incorrect or invalid review comment.
datafusion/sqllogictest/test_files/predicates.slt (2)
675-683: TPCH OR predicate plan fragment updated: LGTMFilterExec appears explicitly in the physical plan, not under CoalesceBatchesExec. This matches the optimizer change; results are unchanged.
804-806: Binary operator with NULL and IN-list simplification: LGTMUpdated physical plans correctly show FilterExec directly above DataSourceExec; expectations stay consistent with logical-plan simplifications.
Also applies to: 814-816, 844-846
datafusion/core/tests/sql/explain_analyze.rs (2)
769-781: Display-indent snapshot reflects new FilterExec placement: LGTMThe FilterExec line now sits directly under the partial aggregate subtree with the same semantics. Snapshot matches the intended plan rendering.
1015-1023: Recursive parquet projection pushdown snapshot: LGTMFilterExec appears without a CoalesceBatches wrapper; path normalization via TMP_DIR remains intact. Behavior and pruning predicates unchanged.
| 06)------FilterExec: id@0 < 10 | ||
| 07)--------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
| 08)----------WorkTableExec: name=nodes | ||
|
|
There was a problem hiding this comment.
Refresh batch-size commentary for recursive CTE plans.
These explain outputs used to show CoalesceBatchesExec: target_batch_size=2, which justified the nearby note “use explain to ensure batch size is set to 2.” After removing the Filter coalescing, that evidence disappeared, so the comment is now misleading. Please adjust the wording (or add another observable hook) so the test description reflects what the plan actually shows.
You could, for example, rewrite it to something like:
-# use explain to ensure that batch size is set to 2. This should produce multiple batches per iteration since the input
-# table 'balances' has 4 rows
+# With batch size set to 2 we still produce multiple batches per iteration; the plan no longer prints a Coalesce node, so we
+# rely on the configuration rather than the explain output to confirm the setting.Also applies to: 164-167
There was a problem hiding this comment.
value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct that the comments in the snapshot tests need to be updated to reflect that FilterExec is no more wrapped inside CoalesceBatchesExec
| 01)FilterExec: c0@0 < 1 | ||
| 02)--DataSourceExec: partitions=1, partition_sizes=[1] |
There was a problem hiding this comment.
Fix stale “expect coalesce” comments.
We dropped the CoalesceBatchesExec nodes from these plans, yet the surrounding comments still read “Expect coalesce …” / “expect batch size to be 1234.” That guidance is now misleading because nothing in the snapshot shows a coalesced batch size anymore. Please update (or remove) those remarks so the test explains match the new plan layout.
Apply this diff to keep the documentation truthful:
-# Expect coalesce and default batch size
+# Expect direct filtering without an explicit coalesce node (default batch size)
-# expect batch size to be 1234
+# coalescing is no longer shown; batch size stays at the configured 1234 internallyAlso applies to: 74-75
🤖 Prompt for AI Agents
In datafusion/sqllogictest/test_files/options.slt around lines 34-35 (and
similarly update lines 74-75), the inline comments referencing "Expect coalesce"
and specific coalesced batch sizes are stale because CoalesceBatchesExec was
removed; remove or reword those comments to reflect the current plan (e.g., drop
any “Expect coalesce …” phrasing and any mention of a coalesced batch size, or
replace with a neutral note like “Expect no CoalesceBatchesExec; batch sizes
shown are raw from DataSourceExec”), ensuring the snapshot comments accurately
describe the shown plan.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct that the comments in the snapshot tests need to be updated to reflect that FilterExec is no more wrapped inside CoalesceBatchesExec
18630: To review by AI