[WIP] Variant shredded storage #147

Tishj · 2025-10-17T08:14:53Z

Changes summary:

Moved VariantValue from extension/parquet/... to core
Moved ShreddingState to core, renamed as VariantShreddingState, made it an abstract class
Moved VariantShredding to core, made it an abstract class with pure virtual method WriteVariantValues
Moved the VariantVisitor struct created for ConvertVariantToValue to its own header, renamed the struct to ValueConverter
Moved the VariantVisitor struct created for variant_normalize to its own header, renamed the struct to VariantNormalizer
Created VariantStats for BaseStatistics
Added LogicalType logical_type; to PersistentColumnData[1]
Created VariantColumnData

[1] This is needed to inform the de/serialization of shredded for VariantColumnData

The internal structure of the VariantColumnData is:

STRUCT (
    unshredded VARIANT (as STRUCT),
    [ shredded <shredded type> ]
)

Where shredded type is of this shape:

STRUCT(
    untyped_value_index UINTEGER,
    typed_value <shredded type / shredded_type[] / primitive type>
)

This mimics the VariantShredding of Parquet's VARIANT type, only we replace the repeated binary encoding with an index (untyped_value_index) that refers to a value in the unshredded field at the root.

At Checkpoint we determine what the shredded type of the Variant should be, with an additional scan.
Then we scan the existing data, transforming it to the shredded representation, write new ColumnData for it, which we then Checkpoint (this is why unshredded is of type STRUCT)

Tishj · 2025-11-05T12:29:34Z

src/include/duckdb/storage/table/variant_column_data.hpp

+
+namespace duckdb {
+
+//! Struct column data represents a struct


Struct -> Variant

…rmance

… clarity and consistency, fixed a bug when the input is a constant vector

…ing index selection logic

…ruct_keys.test`

…ts and bind logic.

…nt vector handling logic

…d test case with additional keys (so the vector_size tests will cover this pitfall)

…rror instead of returning NULL, and revise tests accordingly.

- Added `SupportsAliasReference` and `TryResolveAliasReference` in `ExpressionBinder`. - Implemented `TryResolveAliasReference` in `SelectBinder`. - Created tests for alias_ref functionality, including both valid cases and error scenarios.

…re_in_alter_type

* Initialize WindowExpression Booleans

…nts_shredding_auto

### Summary of Changes - Added support for `alias.name` in SELECT projection: - Introduced `SupportsAliasReference` and `TryResolveAliasReference` in `ExpressionBinder`. - Implemented `TryResolveAliasReference` specifically in `SelectBinder`. - Included robust tests for both valid aliases and edge cases. Related issue: duckdb#13991 (comment) <img width="281" height="229" alt="image" src="https://github.com/user-attachments/assets/31b673a3-0835-4618-add3-e4196abc1e2f" /> Works with: - `SELECT` - `GROUP BY` (works also with inside-expression aliases, `GROUP BY alias.x + 1`, which was impossible in legacy aliasing) - `WHERE` / `HAVING` / `QUALIFY` - `ORDER BY`

…uckdb#19878) This PR fixes duckdb#19875 During flushing of the segment we use `AlignValue` before writing the metadata, because it has to be 8-byte aligned. The existing bug is that we are not accounting for this extra space that will be required, when determining whether we will have enough space to store a container. This resulted in rare cases in creating a segment size that exceeded the size of a block, thankfully causing an InternalException. The other bug is in the RoaringBoolean's ScanPartial / FetchRow implementations. It was reading at an offset that was out of bounds of the dummy vector that was created. And the scanned data was written to the result Vector at a wrong offset. Thankfully none of these were bugs in the compression logic, so no borked files were created as a result of these bugs.

Part of epic: duckdblabs/duckdb-internal#6380 Hi there, This PR introduces a script `generate_peg_transformer.py` that looks at all the grammar files in `autocomplete/statement` and tries to find the corresponding transformer rule. It outputs per `*.gram` file which rules are still missing a transformer rule. In most cases, every grammar rule corresponds to a transformer rule for the PEG parser. Rules can be excluded if we don't require a transformer rule. For `alter.gram` for instance, the output will as follows to indicate that these transformer rules are still missing: ``` --- File: alter.gram --- [ MISSING ] AlterSchemaStmt [ MISSING ] SetData ``` Additionally, when using the `-g` argument, the script prints a template for every missing transformer rule (in alphabetical order). It will output three parts: 1. The method declaration that should be pasted into `peg_transformer.hpp` 2. The method implementation 3. The transformer rule registation The standard type that is used is `unique_ptr<SQLStatement>`. This is most likely not correct, but at the moment it is not possible to know the type in this template script. We could add a file where we can specify the return type for every rule if this is important enough. For now I added a `// TODO` to remind users to check the return type. I decided against letting this script modify the `transform_*.cpp` files directly to avoid making the script overly complex. Perhaps in a future update this could be looked into. For example, for `WithinGroupClause`: ``` --- Generation for rule: WithinGroupClause --- 1. Add DECLARATION to: ../extension/autocomplete/include/transformer/peg_transformer.hpp // TODO: Verify this return type is correct static unique_ptr<SQLStatement> TransformWithinGroupClause(PEGTransformer &transformer, optional_ptr<ParseResult> parse_result); 3. Add REGISTRATION to: ../extension/autocomplete/transformer/peg_transformer_factory.cpp Inside the appropriate Register...() function: REGISTER_TRANSFORM(TransformWithinGroupClause); 4. Add IMPLEMENTATION to: ../extension/autocomplete/transformer/transform_expression.cpp // TODO: Verify this return type is correct unique_ptr<SQLStatement> PEGTransformerFactory::TransformWithinGroupClause(PEGTransformer &transformer, optional_ptr<ParseResult> parse_result) { throw NotImplementedException("TransformWithinGroupClause has not yet been implemented"); } --- End of WithinGroupClause --- ``` In a future PR, when the transformer is close to completion, I would like to use this file in a workflow to check for missing transformer rules. Finally, it will output orphan rules, transformer rules that don't have a corresponding grammar rule. To avoid keeping transformer rules around that don't do anything. There are currently two arguments: - `-g` or `--generate`: - `-s` or `--skip-found`: Skips the rules that have the status [ FOUND ] or [ ENUM ] to keep the output a bit cleaner.

…st config (duckdb#19860) Also replace the `load` paths when we're running the `storage_compatibility` test config. As can be seen by the updated list of skipped tests, we now also properly test the Roaring boolean tests, and have to skip them because they are (correctly) not forwards-compatible. I've had to add a new list of skipped tests because of various problems with doing this path replacement for `load`, such as: - `load` can be used multiple times in a test, with different paths - The paths used in `load` can be referenced by the test (detach, attach, export, import, etc..) Parts of these could be fixed by turning the loaded path into a variable, so we could reference `${ACTIVE_DATABASE_PATH}` for example. ### Misc changes Fixed an uncaught error if the provided version isn't valid (the download fails) Print the errors to stderr rather than stdout, for easier filtering of the script output. Added prints for various reasons that a test was already being skipped without being marked as `[SKIPPED]`, for better debugging of the script.

Decode boolean primitive type in VectorElementDecoder to avoid infinite loop over Decodable version of decode function causing crash.

… TO parquet (duckdb#19336) This PR is a follow up to duckdb#19219 Which added the ability to provide a `SHREDDING` copy option to manually set the shredded type of a VARIANT column. This implemented shredding and made it easy to control, but it requires manual configuration by the user. With this PR, VARIANT columns will always be shredded based on the analysis of the first rowgroup. (TODO: this should be queryable with `parquet_metadata` / `parquet_schema`) This required some changes to the Parquet writing path: - `SchemaElement` items are no longer created during bind. - `ParquetColumnSchema` `schema_index` is now an `optional_idx`, as this index refers to the `SchemaElements` vector, which is populated later. - `ColumnWriter` no longer takes a reference to a `ParquetColumnSchema` object, instead it now owns the schema. - `FillParquetSchema` is removed, consolidated into `CreateWriterRecursive` - [1]`ParquetWriteTransformData` is introduced, this struct holds the reusable state needed to perform a pre-processing step to transform the input columns into the shape required by the ColumnWriter (relevant for VARIANT). - `FinalizeSchema` has been added to create the `SchemaElements`, and populate the `schema_index` in the schema objects. - `HasTransform`, `TransformedType` and `TransformExpression` are added to perform the pre-processing step, using the `ParquetWriteTransformData`. - `AnalyzeSchemaInit`, `AnalyzeSchema` and `AnalyzeSchemaFinalize` are added to auto-detect the shredding for VARIANT columns and apply this change to the schema, to be used before `FinalizeSchema` is called. [1] This is not as clean as I want it to be, the `ParquetWriteTransformData` can't only live in the `ParquetWriteLocalState`, there also needs to be a copy in the `ParquetWriteGlobalState` as well as the `ParquetWritePrepareBatch` method. ### Limitations `DECIMAL` types are never automatically shredded on. This is because the DECIMAL type is somewhat special, in that not only the underlying type has to be the same, the width+scale also has to be the same, so I've opted to not collect data for that.

This PR introduces basic row group pruning with using the query `LIMIT` and `OFFSET` for pagination-style queries like `SELECT * FROM t ORDER BY a LIMIT M OFFSET N`. Currently, these queries are processed with a min-heap with size `M + N` until a threshold of 0.7% of the table cardinality. For queries beyond that threshold, the table is fully sorted and then limited. Especially for larger tables, where we require out-of-core sorting, this results in a hefty performance cliff: Once again, we use the TSBS `cpu` table with 100M rows and a query `SELECT * FROM cpu ORDER BY time LIMIT 100 OFFSET N`. To illustrate the performance cliff, we set `N` once to 720000 and once to 730000 and compare the nightly build to this PR. | Offset | Nightly | This PR | Improvement | | ------ | ------- | ------- | ----------- | | 720K | 0.09s | 0.05s | **1.8x** | | 730K | 11s | 0.05s | **220x** | | 50M | 11s | 0.05s | **220x** | The PR uses a similar algorithm to duckdb#19655 to exclude row groups. Note that this is only possible, if there are no predicates and only for row groups that do not contain null values in the sort column. Moreover, this method only provides a benefit if the order column is already sorted/partitioned to some degree, i.e., pruning is not possible if the sort values are uniformly distributed. Finally, this PR moves the pruning optimizations into its own optimizer rule and adds the ability to fetch row group statistics from partition stats in the optimization phase.

* Extract SortStrategy, NaturalSort and FullSort from HashedSort * Fix HashedSort ContextClient usage. * Create SortStrategy::Factory method. * Clean up inherited instance variables. * Test AsOf usage. * Remove non-partitioned code from HashedSort. * Fix NaturalSort logic errors * Fix FullSort unused variable. fixes: duckdblabs/duckdb-internal#6568

Added `JoinHashTable::ScanKeyColumn`, a shared helper that fills a pointer vector and gathers a build column into a Vector, so full hash-table scans live in one place. Updated `PerfectHashJoinExecutor::FullScanHashTable` and `JoinFilterPushdownInfo::PushInFilter` to call this helper instead of duplicating the scan logic, and kept their early-exit semantics. All tests have been run, `make allunit`

…tract` expressions (duckdb#19829) In the `GetChildColumnBinding` method we currently iterate through child expressions and overwrite the returned `ExpressionBinding` for every child that sets `found_expression` to true. This results in overwriting a found `BoundColumnRefExpression` with a `BoundConstantExpression`, in the case of `struct_extract(my_struct, 'my_field')`

This PR fixes duckdblabs/duckdb-internal#6577 and fixes duckdblabs/duckdb-internal#6578 There's a new assertion in `ColumnScanState::Initialize` that asserts the state is only initialized once, which was broken by AlterType. We fix that by performing the initialize once, not once for every rowgroup.

* Initialize WindowExpression Booleans

This PR fixes nested sleep_ms calls and adds comprehensive tests. ## Changes - Fix nested sleep_ms calls - Add tests for nested sleep functionality ## Testing - Added test coverage for nested sleep scenarios

…orage

Tishj force-pushed the main branch from 9b0cdc1 to 2891ec6 Compare October 17, 2025 08:15

Tishj changed the base branch from main to parquet_variant_intermediate_conversion_merged November 5, 2025 12:16

Tishj commented Nov 5, 2025

View reviewed changes

Tishj changed the base branch from parquet_variant_intermediate_conversion_merged to main November 6, 2025 09:07

Tishj changed the base branch from main to parquet_variant_intermediate_conversion_merged November 6, 2025 09:32

yan-alex and others added 25 commits November 17, 2025 14:32

More bitpacking optimizations

1760677

struct_keys for StructVector

aa2f040

Tests for struct_keys

a5d4e6b

Formatting

a5124a0

Add error handling tests for non-STRUCT inputs in struct_keys

42c6bcf

Remove a redundant input type check for struct_keys function

ee0c997

Refactor struct_keys to use StructKeysBindData for improved perfo…

4ee2cf8

…rmance

Remove unused GetKeys methods from StructVector

2cb1d4d

Refactor struct_keys to replace dict_child with keys_vector for…

75bc964

… clarity and consistency, fixed a bug when the input is a constant vector

Clean up struct_keys by removing redundant assignments and simplify…

b732445

…ing index selection logic

Add test for struct_keys with NULL as input

4d0291f

Remove redundant PRAGMA enable_verification statement from `test_st…

7744525

…ruct_keys.test`

Handle unnamed structs in struct_keys by returning NULL, update tes…

9c94114

…ts and bind logic.

Add test for struct_keys with flat vector input and simplify consta…

1128a4a

…nt vector handling logic

Update struct_keys to reserve list size before processing and expan…

14686fd

…d test case with additional keys (so the vector_size tests will cover this pitfall)

Simplify struct_keys constant vector check using args.AllConstant()

50740b5

Update struct_keys to pass bind function through constructor

9ccfa1b

Disallow struct_keys on unnamed structs, update logic to raise an e…

48c4b49

…rror instead of returning NULL, and revise tests accordingly.

Register struct_keys in extension_entries.hpp.

00abda1

Reorder struct_update in extension_entries.hpp.

75c6c81

Expand alias_ref error handling tests for invalid and NULL arguments

52eb12e

Remove verification pragma from alias_function_errors test

b723491

prettier

a08bb02

removed redundant comments

b404f2b

Tishj and others added 30 commits November 21, 2025 14:45

Merge remote-tracking branch 'upstream/main' into fix_assertion_failu…

bc1ef4e

…re_in_alter_type

formatting

70a0fb6

I guess we can't assume its aligned when bitpacking..

2b6e3e6

Internal duckdb#6653: WindowExpression Construction

8ee8adf

* Initialize WindowExpression Booleans

add all variant storage tests to the skipped tests

06e29b9

formatting

94916fd

rerun CI

1ba9167

Merge remote-tracking branch 'upstream/main' into write_parquet_varia…

273abb5

…nts_shredding_auto

rerun CI

29e7d84

Fix alias reference in SELECT clause error message

fefd40b

Update error message for alias reference in SELECT clause test

ec47f5f

fix merge resolution bug

dfbfe2b

skip variant stats tests

06cbd16

one more test to skip

f94e8ea

[swift] decode boolean primitive type.

7cc8977

[swift] decode boolean primitive type (duckdb#19890)

b4d6aaf

Decode boolean primitive type in VectorElementDecoder to avoid infinite loop over Decodable version of decode function causing crash.

Internal duckdb#6653: WindowExpression Construction (duckdb#19885)

c699132

* Initialize WindowExpression Booleans

Fix nested sleep_ms calls and add tests (duckdb#19837)

ac45a76

This PR fixes nested sleep_ms calls and adds comprehensive tests. ## Changes - Fix nested sleep_ms calls - Add tests for nested sleep functionality ## Testing - Added test coverage for nested sleep scenarios

Merge remote-tracking branch 'upstream/main' into variant_shredded_st…

272459a

…orage

rerun CI

ebde0f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Variant shredded storage #147

[WIP] Variant shredded storage #147

Uh oh!

Tishj commented Oct 17, 2025 •

edited

Loading

Uh oh!

Tishj Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants


		namespace duckdb {

		//! Struct column data represents a struct

[WIP] Variant shredded storage #147

Are you sure you want to change the base?

[WIP] Variant shredded storage #147

Uh oh!

Conversation

Tishj commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tishj Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

Tishj commented Oct 17, 2025 •

edited

Loading