Skip to content

feat: support nested projection#7959

Draft
fengys1996 wants to merge 13 commits intoGreptimeTeam:mainfrom
fengys1996:feat/read-cols-v2
Draft

feat: support nested projection#7959
fengys1996 wants to merge 13 commits intoGreptimeTeam:mainfrom
fengys1996:feat/read-cols-v2

Conversation

@fengys1996
Copy link
Copy Markdown
Contributor

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@github-actions github-actions bot added size/XL docs-not-required This change does not impact docs. labels Apr 13, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the projection mechanism by introducing ProjectionInput and ReadColumns to support nested field access in complex columns. It updates ScanRequest and various storage engines to utilize these new structures, enabling more granular data retrieval. Feedback focuses on improving the robustness and performance of a temporary schema-casting fix in flat_projection.rs, specifically by replacing a risky unwrap() with proper error handling and reducing excessive logging in hot paths.

Comment thread src/mito2/src/read/flat_projection.rs Outdated
Comment thread src/mito2/src/read/flat_projection.rs Outdated
Comment thread src/mito2/src/read/flat_projection.rs Outdated
@fengys1996 fengys1996 force-pushed the feat/read-cols-v2 branch 2 times, most recently from ce9d3f9 to 12212db Compare April 15, 2026 04:16
@github-actions github-actions bot added size/L and removed size/XL labels Apr 15, 2026
@fengys1996 fengys1996 changed the title feat: add logical read columns for nested projection feat: support nested projection Apr 15, 2026
@fengys1996
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ReadColumns and ReadColumn structures to support reading specific nested fields from storage, replacing the previous reliance on simple column ID vectors. It updates the scan and SST read paths to propagate these nested paths, allowing for more efficient data retrieval. Additionally, it includes logic in the FlatProjectionMapper to handle struct widening via casting when the storage returns a narrower schema than the output expects. Feedback indicates that the struct widening logic in flat_projection.rs is incorrectly placed in the block for internal columns rather than user-defined columns and notes potential type mismatches and limitations in the current casting implementation.

Comment thread src/mito2/src/read/flat_projection.rs
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new “logical read columns” abstraction to support nested (struct-field) projection pushdown through the mito2 read pipeline, allowing Parquet leaf-level selection while preserving existing scan/projection behavior.

Changes:

  • Add ReadColumns (logical root columns + nested paths) and derive read columns from both user projection input and predicates.
  • Plumb ReadColumns through scanning, projection mapping, Parquet read format/projection-mask construction, and cache keying.
  • Extend Parquet projection handling to support nested paths and adjust flat projection conversion to handle narrowed struct columns.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/mito2/src/sst/parquet/reader.rs Switch Parquet reader builder projection input to ReadColumns and build projection mask from ReadFormat’s parquet read columns.
src/mito2/src/sst/parquet/read_columns.rs Add constructors/merge helpers for nested parquet read columns; add unit tests for merge behavior.
src/mito2/src/sst/parquet/prefilter.rs Update tests to build ReadFormat using ReadColumns.
src/mito2/src/sst/parquet/format.rs Refactor projection computation to produce ParquetReadColumns (including nested paths) and expose iterators/accessors accordingly.
src/mito2/src/sst/parquet/flat_format.rs Plumb ReadColumns through flat read format construction and projection access.
src/mito2/src/sst/parquet/file_range.rs Update last-row reader wiring to use parquet_read_columns() instead of raw indices; adjust tests.
src/mito2/src/read/scan_util.rs Update test helpers for the new ProjectionMapper::new signature.
src/mito2/src/read/scan_region.rs Build ReadColumns from projection + predicate and carry it through ScanInput/SST reads; remove legacy “build read column ids” logic and its tests.
src/mito2/src/read/read_columns.rs New module implementing ReadColumns, merging/normalizing nested paths, and helpers to derive read columns from projection/predicate (with tests).
src/mito2/src/read/range_cache.rs Change scan fingerprinting to store ReadColumns rather than Vec<ColumnId> and adjust sizing/type tracking.
src/mito2/src/read/projection.rs Update mappers to carry ReadColumns and accept IntoIterator projections; adapt tests accordingly.
src/mito2/src/read/last_row.rs Selector result caching now keys schema compatibility off ParquetReadColumns instead of raw projection indices.
src/mito2/src/read/flat_projection.rs Store ReadColumns in the mapper; add struct-casting path to reconcile narrowed struct reads with unpruned output schema expectations.
src/mito2/src/read/compat.rs Update compat logic to use mapper read columns API.
src/mito2/src/read/batch_adapter.rs Adjust tests for ProjectionMapper::new signature changes.
src/mito2/src/read.rs Export the new read_columns module.
src/mito2/src/memtable/bulk/part_reader.rs Update projection mask construction and record batch projection to use projection index iterator.
src/mito2/src/memtable/bulk/context.rs Adapt bulk iter context projection plumbing to ReadColumns/new ReadFormat::new signature.
src/mito2/src/cache.rs Store selector cache schema as ParquetReadColumns; update constructors/tests.
src/common/recordbatch/src/error.rs Add CastColumn error variant used by the new struct casting path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/mito2/src/read/range_cache.rs
Comment thread src/mito2/src/read/flat_projection.rs Outdated
Comment thread src/mito2/src/read/read_columns.rs Outdated
@fengys1996
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d07ab9efa1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +838 to +840
if !column_id_to_projected_index.contains_key(&col_id) {
let projected_index = column_id_to_projected_index.len();
column_id_to_projected_index.insert(col_id, projected_index);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard projected index map against empty nested leaf reads

When a nested path is requested but an SST schema lacks that nested leaf (for example, querying a newly added struct field against older files), this code still allocates a column_id_to_projected_index entry before parquet leaf projection is resolved. In that case the parquet reader can return zero leaves for the root, but downstream code still indexes into the projected batch as if the column exists, which misaligns columns and surfaces cast/type errors instead of compat behavior (null/default for missing nested fields). Please only assign projected indices for roots that actually survive leaf selection, or fallback to whole-root projection when no nested leaf matches.

Useful? React with 👍 / 👎.

@github-actions github-actions bot added size/XXL and removed size/XL labels Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants