perf: single-pass plan traversal in Predicate::new#113
perf: single-pass plan traversal in Predicate::new#113xudong963 merged 3 commits intodatafusion-contrib:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes SpjNormalForm::new by consolidating multiple plan traversals into a single pass, improving performance for wide tables.
Changes:
- Refactored
Predicate::newto use a single traversal that collects schema, columns, filters, and referenced tables in one pass - Added
referenced_tablesfield toPredicatestruct to store tables collected during traversal - Modified
SpjNormalForm::newto reusereferenced_tablesfromPredicateinstead of performing a separate traversal
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
cc @xudong963 |
src/rewrite/normal_form.rs
Outdated
| })?; | ||
|
|
||
| // Initialize data structures with known capacity | ||
| let n = columns_info.len(); |
There was a problem hiding this comment.
Good point, will address this!
| impl Predicate { | ||
| /// Create a new Predicate by analyzing the given logical plan. | ||
| /// Uses single-pass traversal to collect schema, columns, filters, and referenced tables. | ||
| fn new(plan: &LogicalPlan) -> Result<Self> { |
There was a problem hiding this comment.
How about adding some unit tests for the new() method to ensure we're collecting the expected items in Predicate as previously.
There was a problem hiding this comment.
Good point, will add the unit test!
|
Thank you @xudong963 for review, addressed all comments now. |
src/rewrite/normal_form.rs
Outdated
| // Note: Join is not supported yet, so we test with a single table | ||
| // This test verifies the single-pass traversal works correctly | ||
| let plan = ctx | ||
| .sql("SELECT a, b FROM t1 WHERE a >= 0 AND b <= 100") |
There was a problem hiding this comment.
So I think the test is duplicated with the above one.
There was a problem hiding this comment.
IIRC, the mv algo supports join
There was a problem hiding this comment.
@xudong963 We are still not supporting join now, see:
There was a problem hiding this comment.
Removed the duplicated test now.
Summary
Optimize
SpjNormalForm::newby consolidating multiple plan traversals into a single pass.Problem
The original implementation of
Predicate::newtraversed the logical plan 4 times:SpjNormalForm::new) Collect referenced tablesThis redundant traversal caused significant overhead, especially for wide tables.
Solution
plan.apply()call inPredicate::newreferenced_tablesfield toPredicatestructreferenced_tablesfromPredicateinSpjNormalForm::newBenchmark Results
Up to 35% performance improvement for wide tables (80+ columns).
Changes
src/rewrite/normal_form.rs: RefactoredPredicate::newto use single-pass traversalTesting
Existing tests pass without modification. This is a pure internal refactor with no API changes.