Skip to content

feat(parquet): skip RowFilter evaluation for fully matched row groups#9694

Open
xudong963 wants to merge 1 commit intoapache:mainfrom
xudong963:skip-filter-fully-matched-row-groups
Open

feat(parquet): skip RowFilter evaluation for fully matched row groups#9694
xudong963 wants to merge 1 commit intoapache:mainfrom
xudong963:skip-filter-fully-matched-row-groups

Conversation

@xudong963
Copy link
Copy Markdown
Member

@xudong963 xudong963 commented Apr 13, 2026

Which issue does this PR close?

Rationale for this change

When DataFusion evaluates a Parquet scan with filter pushdown, it uses row group statistics to determine which row groups to scan. In many real-world queries, the predicate matches all rows in some (or all) row groups — for example, a time-range filter where entire row groups fall within the range, or a WHERE status != 'DELETED' filter on data that contains no deleted rows.

Today, even when row group statistics prove that every row satisfies the predicate, the RowFilter is still evaluated row-by-row during decoding. This means the filter columns are decoded and the predicate expression is evaluated for every row — work that produces no useful filtering and can be expensive, especially when filter columns are large (e.g., strings) or the predicate is complex.

This PR adds a mechanism to skip RowFilter evaluation entirely for row groups that are known to be "fully matched" based on statistics. The caller (e.g., DataFusion) determines which row groups are fully matched during row group pruning and passes that information to the reader builder. During decoding, fully matched row groups skip straight to data materialization, bypassing filter column decoding and predicate evaluation.

What changes are included in this PR?

  1. New builder method with_fully_matched_row_groups(Vec<usize>) on ArrowReaderBuilder — allows callers to specify which row groups have all rows matching the filter predicate.

  2. Skip filter in RowGroupReaderBuilder::try_transition() — when a row group is in the fully-matched set, the Start state transitions directly to StartData, bypassing the Filters / WaitingOnFilterData states entirely. The filter is preserved (put back into self.filter) for subsequent non-fully-matched row groups.

  3. Plumbed through all decoder paths — the field is propagated through ParquetPushDecoderBuilder, ParquetRecordBatchStreamBuilder (async), and ignored in the sync reader (which processes one row group at a time).

Design choices:

  • The fully-matched set is stored as a HashSet<usize> on RowGroupReaderBuilder for O(1) lookup, rather than on RowGroupDecoderState, so the state enum size is unchanged (preserving the existing 200-byte size test).
  • The API uses Option<Vec<usize>> at the builder level and converts to HashSet internally.

Are these changes tested?

The optimization is exercised by an end-to-end benchmark in DataFusion that uses ParquetPushDecoder directly (the same code path used by DataFusion's async Parquet opener). The benchmark verifies correctness by asserting the expected row count.

Unit tests can be added if reviewers prefer — happy to add tests that verify:

  • Fully matched row groups skip filter evaluation and still return all rows
  • Non-fully-matched row groups in the same scan still have filters applied
  • The API is a no-op when no fully matched row groups are specified

(I'll open a draft PR in datafusion side tomorrow)

Are there any user-facing changes?

Yes — a new public method ArrowReaderBuilder::with_fully_matched_row_groups() is added. This is a purely additive, non-breaking change. Existing code is unaffected since the default is None (no row groups are marked as fully matched).

…ully matched row groups

When row group statistics prove that all rows in a row group satisfy
the filter predicate, the RowFilter evaluation can be skipped entirely
for those row groups. This avoids the cost of decoding filter columns
and evaluating the predicate expression.

Adds `with_fully_matched_row_groups(Vec<usize>)` to ArrowReaderBuilder
which flows through to RowGroupReaderBuilder. When processing a fully
matched row group, the Start state transitions directly to StartData,
bypassing all filter evaluation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the parquet Changes to the parquet crate label Apr 13, 2026
@xudong963 xudong963 force-pushed the skip-filter-fully-matched-row-groups branch from 258a28d to 76c0284 Compare April 13, 2026 13:05
xudong963 added a commit to xudong963/arrow-datafusion that referenced this pull request Apr 15, 2026
When row group statistics prove that ALL rows satisfy the filter
predicate, skip both RowFilter evaluation (late materialization)
and page index pruning for those row groups. This avoids wasted
work decoding filter columns and evaluating predicates that produce
no useful filtering.

Depends on apache/arrow-rs#9694 for the `with_fully_matched_row_groups()`
builder API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
xudong963 added a commit to xudong963/arrow-datafusion that referenced this pull request Apr 15, 2026
When row group statistics prove that ALL rows satisfy the filter
predicate, skip both RowFilter evaluation (late materialization)
and page index pruning for those row groups. This avoids wasted
work decoding filter columns and evaluating predicates that produce
no useful filtering.

Depends on apache/arrow-rs#9694 for the `with_fully_matched_row_groups()`
builder API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@xudong963
Copy link
Copy Markdown
Member Author

I've opened a related PR in DataFusion, which is using my arrow-rs fork version.

If we think the two PRs are in the right direction, then we can review the arrow-rs side PR first, after it's merged and lands in DF, we can continue to change the arrow-rs version in DF PR, and merge the DF PR

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @xudong963 -- this is a neat idea and I think the idea of not evaluating pushdown fulters when DataFusion can prove they won't filter things out is totally the right way to go

However, I am not sure this is the right level to do this filtering. I think it might keep the APIs simpler if the push down filter removal happens in DataFusion itself -- for example, DataFusion could make different ParquetPushDecoders for each row group, and the ones where the filters don't filter anything can be disabled

IN the context of the "morsel" work -- I think we are heading towards a scan in DataFusion where each RowGroup (or collection of row groups) is a morsel and then we can treat the morsels individually for IO, pruning, and even moving around cores.

Any chance you want to help explore that option in DataFusion?

@xudong963
Copy link
Copy Markdown
Member Author

I think it might keep the APIs simpler if the push down filter removal happens in DataFusion itself -- for example, DataFusion could make different ParquetPushDecoders for each row group, and the ones where the filters don't filter anything can be disabled

ahah, this is cleaner, I'd like to have a try

xudong963 added a commit to xudong963/arrow-datafusion that referenced this pull request Apr 16, 2026
When row group statistics prove that ALL rows satisfy the filter
predicate, skip both RowFilter evaluation (late materialization)
and page index pruning for those row groups. This avoids wasted
work decoding filter columns and evaluating predicates that produce
no useful filtering.

Depends on apache/arrow-rs#9694 for the `with_fully_matched_row_groups()`
builder API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@xudong963
Copy link
Copy Markdown
Member Author

apache/datafusion@d6c3879 made a refactor on DF side, which doesn't need the changes from arrow side

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 16, 2026

apache/datafusion@d6c3879 made a refactor on DF side, which doesn't need the changes from arrow side

This is neat.

What do you think about the idea of an API to "split" decoders or something 🤔

@alamb alamb marked this pull request as draft April 16, 2026 11:29
@alamb alamb marked this pull request as ready for review April 16, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants