feat: support prefiltering any columns in flat format#7972
Draft
evenyag wants to merge 18 commits intoGreptimeTeam:mainfrom
Draft
feat: support prefiltering any columns in flat format#7972evenyag wants to merge 18 commits intoGreptimeTeam:mainfrom
evenyag wants to merge 18 commits intoGreptimeTeam:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request refactors the prefiltering mechanism in the mito2 engine to support primary key, simple, and physical filters more uniformly. It introduces BulkFilterPlan and ReaderFilterPlan to classify predicates and optimizes the prefilter phase by allowing direct evaluation of projected columns. A critical logic error was found in the filter classification logic where non-projected filters were being added to the simple prefilter list, potentially causing runtime panics when the expected columns are missing from the batch.
7402774 to
1a75e44
Compare
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
1a75e44 to
ca8c500
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
Previously, parquet prefiltering only supported primary key (PK) columns — tag predicates were evaluated by decoding the encoded primary key column to compute a refined row selection before reading the remaining columns.
This PR generalizes prefiltering to support all column types (tags, fields, and timestamps), not just primary keys. Key changes:
New filter planning layer (
prefilter.rs):build_reader_filter_plan()andbuild_bulk_filter_plan()that categorize predicates into three groups:IN,IS NULL,IS NOT NULL,BETWEEN) compiled to physical exprs for single-column evaluationPreFilterMode(AllvsSkipFields) controls whether field column predicates participate in prefilteringUnified prefilter execution (
prefilter.rs):PrefilterContextnow holds a unified projection covering PK + simple + physical filter columnsexecute_prefilter()reads the projected columns once, then applies all three filter types to produce a combinedBooleanBufferrow maskRefactored reader/filter path (
reader.rs,file_range.rs):SimpleFilterContextno longer tracksusable_primary_key_filter; filter routing is handled by the plan builderPhysicalFilterContextwraps aPhysicalExprfor single-column prefilter evaluationRowGroupBuildContextis simplified — prefilter-related fields (filters,skip_fields) are removedprecise_filter_flat()no longer needs to skip prefiltered PK filters since they're excluded from the remaining filter listOther changes:
Predicate::to_physical_expr()as a public method for building a single physical exprPerformance
4.5x faster in TSBS cpu-max-all-8 query.
Before
After
PR Checklist
Please convert it to a draft if some of the following conditions are not met.