Skip to content

feat: support prefiltering any columns in flat format#7972

Draft
evenyag wants to merge 18 commits intoGreptimeTeam:mainfrom
evenyag:feat/prefilter-other-columns
Draft

feat: support prefiltering any columns in flat format#7972
evenyag wants to merge 18 commits intoGreptimeTeam:mainfrom
evenyag:feat/prefilter-other-columns

Conversation

@evenyag
Copy link
Copy Markdown
Contributor

@evenyag evenyag commented Apr 15, 2026

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Previously, parquet prefiltering only supported primary key (PK) columns — tag predicates were evaluated by decoding the encoded primary key column to compute a refined row selection before reading the remaining columns.

This PR generalizes prefiltering to support all column types (tags, fields, and timestamps), not just primary keys. Key changes:

New filter planning layer (prefilter.rs):

  • Introduces build_reader_filter_plan() and build_bulk_filter_plan() that categorize predicates into three groups:
    • PK filters: tag predicates evaluated via encoded primary key decoding (existing path)
    • Simple prefilters: tag/field/timestamp predicates on columns directly readable from parquet
    • Physical prefilters: more complex expressions (IN, IS NULL, IS NOT NULL, BETWEEN) compiled to physical exprs for single-column evaluation
  • A PreFilterMode (All vs SkipFields) controls whether field column predicates participate in prefiltering
  • Prefiltering is enabled only when the estimated selectivity meets a configurable threshold

Unified prefilter execution (prefilter.rs):

  • PrefilterContext now holds a unified projection covering PK + simple + physical filter columns
  • execute_prefilter() reads the projected columns once, then applies all three filter types to produce a combined BooleanBuffer row mask
  • Filters already applied during prefiltering are removed from the post-read filter list to avoid redundant evaluation

Refactored reader/filter path (reader.rs, file_range.rs):

  • SimpleFilterContext no longer tracks usable_primary_key_filter; filter routing is handled by the plan builder
  • New PhysicalFilterContext wraps a PhysicalExpr for single-column prefilter evaluation
  • RowGroupBuildContext is simplified — prefilter-related fields (filters, skip_fields) are removed
  • precise_filter_flat() no longer needs to skip prefiltered PK filters since they're excluded from the remaining filter list

Other changes:

  • Extracts Predicate::to_physical_expr() as a public method for building a single physical expr
  • Adds tests covering generalized prefilter behavior and verifying physical filters are not re-evaluated post-filter

Performance
4.5x faster in TSBS cpu-max-all-8 query.

Before
Run complete after 100 queries with 1 workers (Overall query rate 5.51 queries/sec):
Influx max of all CPU metrics, random    8 hosts, random 8h0m0s by 1h:
min:   128.09ms, med:   182.73ms, mean:   181.21ms, max:  239.89ms, stddev:    29.43ms, sum:  18.1sec, count: 100
all queries                                                          :
min:   128.09ms, med:   182.73ms, mean:   181.21ms, max:  239.89ms, stddev:    29.43ms, sum:  18.1sec, count: 100
wall clock time: 18.143011sec
After
Run complete after 100 queries with 1 workers (Overall query rate 25.21 queries/sec):
Influx max of all CPU metrics, random    8 hosts, random 8h0m0s by 1h:
min:    26.60ms, med:    39.41ms, mean:    39.60ms, max:   55.67ms, stddev:     5.73ms, sum:   4.0sec, count: 100
all queries                                                          :
min:    26.60ms, med:    39.41ms, mean:    39.60ms, max:   55.67ms, stddev:     5.73ms, sum:   4.0sec, count: 100
wall clock time: 3.973266sec

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@github-actions github-actions bot added size/L docs-not-required This change does not impact docs. labels Apr 15, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the prefiltering mechanism in the mito2 engine to support primary key, simple, and physical filters more uniformly. It introduces BulkFilterPlan and ReaderFilterPlan to classify predicates and optimizes the prefilter phase by allowing direct evaluation of projected columns. A critical logic error was found in the filter classification logic where non-projected filters were being added to the simple prefilter list, potentially causing runtime panics when the expected columns are missing from the batch.

Comment thread src/mito2/src/sst/parquet/prefilter.rs Outdated
@evenyag evenyag force-pushed the feat/prefilter-other-columns branch from 7402774 to 1a75e44 Compare April 16, 2026 09:00
evenyag added 18 commits April 16, 2026 17:01
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
@evenyag evenyag force-pushed the feat/prefilter-other-columns branch from 1a75e44 to ca8c500 Compare April 16, 2026 09:01
@evenyag evenyag changed the title feat: support prefiltering other columns feat: support prefiltering any columns in flat format Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant