[Variant] Fix variant_get to return List<T> instead of List<Struct>#9631
[Variant] Fix variant_get to return List<T> instead of List<Struct>#9631alamb merged 4 commits intoapache:mainfrom
variant_get to return List<T> instead of List<Struct>#9631Conversation
…ct>` for list types
klion26
left a comment
There was a problem hiding this comment.
LGTM, thanks for this contribution.
scovich
left a comment
There was a problem hiding this comment.
LGTM. Once question and one nit.
| Typed(Box<VariantToArrowRowBuilder<'a>>), | ||
| /// Produces a shredded struct with `value` and `typed_value` fields. | ||
| Shredded(Box<VariantToShreddedVariantRowBuilder<'a>>), |
There was a problem hiding this comment.
Any particular reason these need to be boxed? They're fully strongly typed, no?
There was a problem hiding this comment.
They need to be boxed because of a recursive type cycle:
ListElementBuilder::Typed → VariantToArrowRowBuilder → Array(ArrayVariantToArrowRowBuilder) → VariantToListArrowRowBuilder → ListElementBuilder
It would be infinitely sized without Box.
| /// * `shredded` - If true, element builders produce shredded structs with `value`/`typed_value` | ||
| /// fields (for [`crate::shred_variant()`]). If false, element builders produce strongly typed | ||
| /// arrays directly (for [`crate::variant_get()`]). | ||
| pub(crate) fn try_new( |
There was a problem hiding this comment.
It looks like this shredded param allows quite a bit of code reuse. Should we look at other shred-vs-get pairs to see if similar consolidation is possible? Struct, in particular, is likely to be complex?
There was a problem hiding this comment.
From the current impl, the struct builders aren't as straightforward to consolidate as the list builders were. Created an issue #9633 to keep track of the refactor
# Conflicts: # parquet-variant-compute/src/variant_to_arrow.rs
scovich
left a comment
There was a problem hiding this comment.
LGTM, thanks for the fix
Which issue does this PR close?
variant_get(..., List<_>)non-Struct types support #9615.Rationale for this change
variant_get(..., List<T>)was returningList<Struct<value, typed_value>>instead ofList<T>. This happened becauseVariantToListArrowRowBuilderunconditionally usedmake_variant_to_shredded_variant_arrow_row_builderfor its element builder, which produces the shredded representation (a struct withvalueandtyped_valuefields). This is correct forshred_variant, butvariant_getshould produce strongly typed arrays directly.What changes are included in this PR?
ListElementBuilderenum withTypedandShreddedvariants to abstract over the two element output modes.TypedwrapsVariantToArrowRowBuilderand produces the target type directly (e.g.,Int64Array).ShreddedwrapsVariantToShreddedVariantRowBuilderand produces theStruct<value, typed_value>used by shredding.shredded: boolparameter toArrayVariantToArrowRowBuilder::try_newandVariantToListArrowRowBuilder::try_newto select the appropriate mode.Are these changes tested?
Yes
Are there any user-facing changes?
Yes, the
variant_get(..., List<T>)should have correct behavior now