Skip to content

[Variant] Fix variant_get to return List<T> instead of List<Struct>#9631

Merged
alamb merged 4 commits intoapache:mainfrom
liamzwbao:issue-9615-fix-variant-get-list
Apr 6, 2026
Merged

[Variant] Fix variant_get to return List<T> instead of List<Struct>#9631
alamb merged 4 commits intoapache:mainfrom
liamzwbao:issue-9615-fix-variant-get-list

Conversation

@liamzwbao
Copy link
Copy Markdown
Contributor

@liamzwbao liamzwbao commented Mar 31, 2026

Which issue does this PR close?

Rationale for this change

variant_get(..., List<T>) was returning List<Struct<value, typed_value>> instead of List<T>. This happened because VariantToListArrowRowBuilder unconditionally used make_variant_to_shredded_variant_arrow_row_builder for its element builder, which produces the shredded representation (a struct with value and typed_value fields). This is correct for shred_variant, but variant_get should produce strongly typed arrays directly.

What changes are included in this PR?

  • Introduced a ListElementBuilder enum with Typed and Shredded variants to abstract over the two element output modes.
    • Typed wraps VariantToArrowRowBuilder and produces the target type directly (e.g., Int64Array).
    • Shredded wraps VariantToShreddedVariantRowBuilder and produces the Struct<value, typed_value> used by shredding.
  • Added a shredded: bool parameter to ArrayVariantToArrowRowBuilder::try_new and VariantToListArrowRowBuilder::try_new to select the appropriate mode.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, the variant_get(..., List<T>) should have correct behavior now

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Mar 31, 2026
@liamzwbao liamzwbao marked this pull request as ready for review March 31, 2026 00:24
@liamzwbao
Copy link
Copy Markdown
Contributor Author

Tagging @sdf-jkl @scovich @klion26 for a review. Thanks!

Copy link
Copy Markdown
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this contribution.

Copy link
Copy Markdown
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Once question and one nit.

Comment on lines +901 to +903
Typed(Box<VariantToArrowRowBuilder<'a>>),
/// Produces a shredded struct with `value` and `typed_value` fields.
Shredded(Box<VariantToShreddedVariantRowBuilder<'a>>),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason these need to be boxed? They're fully strongly typed, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need to be boxed because of a recursive type cycle:

ListElementBuilder::Typed → VariantToArrowRowBuilder → Array(ArrayVariantToArrowRowBuilder) → VariantToListArrowRowBuilder → ListElementBuilder

It would be infinitely sized without Box.

/// * `shredded` - If true, element builders produce shredded structs with `value`/`typed_value`
/// fields (for [`crate::shred_variant()`]). If false, element builders produce strongly typed
/// arrays directly (for [`crate::variant_get()`]).
pub(crate) fn try_new(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this shredded param allows quite a bit of code reuse. Should we look at other shred-vs-get pairs to see if similar consolidation is possible? Struct, in particular, is likely to be complex?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the current impl, the struct builders aren't as straightforward to consolidate as the list builders were. Created an issue #9633 to keep track of the refactor

# Conflicts:
#	parquet-variant-compute/src/variant_to_arrow.rs
Copy link
Copy Markdown
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix

@alamb alamb merged commit 0936b38 into apache:main Apr 6, 2026
17 checks passed
@liamzwbao liamzwbao deleted the issue-9615-fix-variant-get-list branch April 6, 2026 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] variant_get(..., List<_>) non-Struct types support

4 participants