Skip to content

Optimize rewrite performance#115

Merged
xudong963 merged 4 commits intodatafusion-contrib:mainfrom
zhuqi-lucas:optimize_rewrite_performance
Feb 3, 2026
Merged

Optimize rewrite performance#115
xudong963 merged 4 commits intodatafusion-contrib:mainfrom
zhuqi-lucas:optimize_rewrite_performance

Conversation

@zhuqi-lucas
Copy link
Contributor

Optimize rewrite_from performance for wide tables

Summary

This PR optimizes SpjNormalForm::rewrite_from by reducing redundant allocations and avoiding expensive tree traversals for simple column expressions. Benchmarks show 44-75% improvement depending on column count.

Problem

Flamegraph analysis revealed two bottlenecks in rewrite_from:

  1. Repeated columns() calls: DFSchema::columns() creates a new Vec on each call. The original code called it inside the loop, causing O(n²) allocations for n columns.

  2. Unnecessary tree transforms: Every output expression went through normalize_expr + rewrite, even simple Column expressions that only need an O(1) HashMap lookup.

Solution

1. Cache columns() result

// Before: O(n²) allocations
for (i, output_expr) in self.output_exprs.iter().enumerate() {
    let column = &self.output_schema.columns()[i];  // New Vec every iteration!
}

// After: O(n) allocations  
let output_columns = self.output_schema.columns();  // Cache once
for (i, output_expr) in self.output_exprs.iter().enumerate() {
    let column = &output_columns[i];  // Direct index
}

2. Fast path for Column expressions

// Before: All expressions go through expensive transform
let new_output_expr = other.predicate.normalize_expr(output_expr.clone())
    .rewrite(&mut other)?.data;

// After: Simple columns use O(1) lookup
if let Expr::Column(col) = output_expr {
    let normalized_col = other.predicate.normalize_column(col);  // O(1)
    other.find_output_column(&normalized_col)  // O(n) scan, no recursion
} else {
    // Complex expressions still use full transform
}

Benchmark Results

view_matcher_spj/rewrite_from/cols=10   [-22.4%]  3.72µs
view_matcher_spj/rewrite_from/cols=20   [-29.9%]  6.24µs
view_matcher_spj/rewrite_from/cols=40   [-44.0%]  11.27µs
view_matcher_spj/rewrite_from/cols=80   [-55.9%]  24.37µs
view_matcher_spj/rewrite_from/cols=160  [-66.7%]  56.90µs
view_matcher_spj/rewrite_from/cols=320  [-75.1%]  142.37µs

The improvement scales with column count because we're reducing O(n²) → O(n) complexity.

Changes

  • rewrite_from: Cache output_schema.columns() outside loop
  • rewrite_from: Add fast path for Expr::Column variants
  • find_output_column: New helper method for column lookup
  • normalize_column: New O(1) column normalization method
  • normalize_expr: Add fast path for simple Column expressions
  • Added two unit tests for the optimizations

Testing

  • All existing tests pass
  • Added test_normalize_column_fast_path - verifies equivalence class normalization
  • Added test_rewrite_from_with_many_columns - verifies wide table handling

Copilot AI review requested due to automatic review settings January 29, 2026 02:59
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes SpjNormalForm::rewrite_from and related normalization paths to significantly reduce allocation and traversal overhead, especially for wide schemas with many projected columns.

Changes:

  • Cache output_schema.columns() once in rewrite_from and reuse it inside the output expression loop to avoid repeated Vec allocations.
  • Introduce fast paths for column handling: Predicate::normalize_column, a column-specialized branch in Predicate::normalize_expr, and SpjNormalForm::find_output_column plus a Column-only branch in rewrite_from.
  • Extend the test suite in normal_form.rs with additional async tests targeting predicate normalization and wide-table rewrite behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +838 to +841
// Fast path: if it's a simple Column, avoid full transform traversal
if let Expr::Column(ref c) = e {
return Expr::Column(self.normalize_column(c));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a simple column, even if it goes through the transform, it'll be fast imo, I'm curious why this reduces cost

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xudong963 for revivew,

Good question! The overhead isn't from the transform logic itself, but from the transform() machinery setup - it creates closures, iterators, and Transformed wrapper objects even for leaf nodes.
For 41 columns × 5-7 MVs × every query, these small costs add up. The fast path is just a HashMap lookup + clone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added the comments to PR in latest commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And i believe the improvement is coming more from another improvement which cached the Repeated columns() calls.

@zhuqi-lucas zhuqi-lucas requested a review from xudong963 February 3, 2026 06:21
@xudong963
Copy link
Member

Approved now, sorry for the late

@xudong963 xudong963 merged commit 429e52a into datafusion-contrib:main Feb 3, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants