fix(parquet): fix CDC panic on nested ListArrays with null entries by kszucs · Pull Request #2 · kszucs/arrow-rs

kszucs · 2026-04-01T12:45:07Z

refactor: simplify iterator using cloned().map(Some) (refactor: simplify iterator using cloned().map(Some) apache/arrow-rs#9449)
docs(parquet): Fix broken links in README (docs(parquet): Fix broken links in README apache/arrow-rs#9467)
Add NullBuffer::from_unsliced_buffer helper and refactor call sites (Add NullBuffer::from_unsliced_buffer helper and refactor call sites apache/arrow-rs#9411)
Add list-like types support to VariantArray::try_new (Add list-like types support to VariantArray::try_new apache/arrow-rs#9457)
Add PrimitiveRunBuilder::with_data_type() to customize the values' DataType (Add PrimitiveRunBuilder::with_data_type() to customize the values' DataType apache/arrow-rs#9473)
fix: resolution of complex type variants in Avro unions (fix: resolution of complex type variants in Avro unions apache/arrow-rs#9328)
Update planned release schedule in README.md (Update planned release schedule in README.md apache/arrow-rs#9466)
Convert prettyprint tests in arrow-cast to insta inline snapshots (Convert prettyprint tests in arrow-cast to insta inline snapshots apache/arrow-rs#9472)
Deprecate ArrowTimestampType::make_value in favor of from_naive_datetime (Deprecate ArrowTimestampType::make_value in favor of from_naive_datetime apache/arrow-rs#9491)
chore: remove duplicate macro partially_shredded_variant_array_gen (chore: remove duplicate macro partially_shredded_variant_array_gen apache/arrow-rs#9498)
[Variant] Enahcne bracket access for VariantPath ([Variant] Enahcne bracket access for VariantPath apache/arrow-rs#9479)
Move ListLikeArray to arrow-array to be shared with json writer and parquet unshredding (Move ListLikeArray to arrow-array to be shared with json writer and parquet unshredding apache/arrow-rs#9437)
Add append_value_n to GenericByteBuilder (Add append_value_n to GenericByteBuilder apache/arrow-rs#9426)
Add append_nulls to MapBuilder (Add append_nulls to MapBuilder apache/arrow-rs#9432)
Add append_non_nulls to StructBuilder (Add append_non_nulls to StructBuilder apache/arrow-rs#9430)
Update strum_macros requirement from 0.27 to 0.28 (Update strum_macros requirement from 0.27 to 0.28 apache/arrow-rs#9471)
refactor: simplify dynamic state for Avro record projection (refactor: simplify dynamic state for Avro record projection apache/arrow-rs#9419)
Simplify downcast_...!() macro definitions (Simplify downcast_...!() macro definitions apache/arrow-rs#9454)
Add some benchmarks for decoding delta encoded Parquet (Add some benchmarks for decoding delta encoded Parquet apache/arrow-rs#9500)
fix: remove incorrect debug assertion in BatchCoalescer (fix: remove incorrect debug assertion in BatchCoalescer apache/arrow-rs#9508)
support large string for unshred variant (support large string for unshred variant apache/arrow-rs#9515)
support string view unshred variant (support string view unshred variant apache/arrow-rs#9514)
Fix skip_records over-counting when partial record precedes num_rows page skip (Fix skip_records over-counting when partial record precedes num_rows page skip apache/arrow-rs#9374)
Make with_file_decryption_properties pub instead of pub(crate) (Make with_file_decryption_properties pub instead of pub(crate) apache/arrow-rs#9532)
fix: handle Null type in try_merge for Struct, List, LargeList, and Union (fix: handle Null type in try_merge for Struct, List, LargeList, and Union apache/arrow-rs#9524)
[Json] Add benchmarks for list json reader ([Json] Add benchmarks for list json reader apache/arrow-rs#9507)
feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array (feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array apache/arrow-rs#8918)
Fix Invalid offset in sparse column chunk data error for multiple predicates (Fix Invalid offset in sparse column chunk data error for multiple predicates apache/arrow-rs#9509)
fix: Do not assume missing nullcount stat means zero nullcount (fix: Do not assume missing nullcount stat means zero nullcount apache/arrow-rs#9481)
feat: expose arrow schema on async avro reader (feat: expose arrow schema on async avro reader apache/arrow-rs#9534)
Implement min, max, sum for run-end-encoded arrays. (Implement min, max, sum for run-end-encoded arrays. apache/arrow-rs#9409)
Add benchmark for infer_json_schema (Add benchmark for infer_json_schema apache/arrow-rs#9546)
chore: Protect main branch with required reviews (chore: Protect main branch with required reviews apache/arrow-rs#9547)
Replace interleave overflow panic with error (Replace interleave overflow panic with error apache/arrow-rs#9549)
Unroll interleave -25-30% (Unroll interleave -25-30% apache/arrow-rs#9542)
chore(deps): bump black from 24.3.0 to 26.3.1 in /parquet/pytest (chore(deps): bump black from 24.3.0 to 26.3.1 in /parquet/pytest apache/arrow-rs#9545)
add shred_variant support for LargeUtf8 and LargeBinary (add shred_variant support for LargeUtf8 and LargeBinary apache/arrow-rs#9554)
chore(deps): update lz4_flex requirement from 0.12 to 0.13 (chore(deps): update lz4_flex requirement from 0.12 to 0.13 apache/arrow-rs#9565)
[minor] Download clickbench file when missing ([minor] Download clickbench file when missing apache/arrow-rs#9553)
feat: add RunArray::new_unchecked and RunArray::into_parts (feat: add RunArray::new_unchecked and RunArray::into_parts apache/arrow-rs#9376)
DeltaBitPackEncoderConversion: Fix panic message on invalid type (DeltaBitPackEncoderConversion: Fix panic message on invalid type apache/arrow-rs#9552)
feat(arrow-avro): Configurable Arrow timezone ID for Avro timestamps (feat(arrow-avro): Configurable Arrow timezone ID for Avro timestamps apache/arrow-rs#9280)
Add has_true() and has_false() to BooleanArray (Add has_true() and has_false() to BooleanArray apache/arrow-rs#9511)
Use chunks_exact for has_true/has_false to enable compiler unrolling (Use chunks_exact for has_true/has_false to enable compiler unrolling apache/arrow-rs#9570)
Add mutable operations to BooleanBuffer (Bit*Assign) (Add mutable operations to BooleanBuffer (Bit*Assign) apache/arrow-rs#9567)
arrow-select: fix MutableArrayData interleave for ListView (arrow-select: fix MutableArrayData interleave for ListView apache/arrow-rs#9560)
chore: extend record_batch macro to support variables and expressions (chore: extend record_batch macro to support variables and expressions apache/arrow-rs#9522)
feat(arrow-avro): HeaderInfo to expose OCF header (feat(arrow-avro): HeaderInfo to expose OCF header apache/arrow-rs#9548)
Bump actions/download-artifact from 7 to 8 (Bump actions/download-artifact from 7 to 8 apache/arrow-rs#9488)
fix: first next_back() on new RowsIter panics (fix: first next_back() on new RowsIter panics apache/arrow-rs#9505)
Optimize delta binary decoder in the case where bitwidth=0 (Optimize delta binary decoder in the case where bitwidth=0 apache/arrow-rs#9477)
Add claim method to recordbatch for memory accounting (Add claim method to recordbatch for memory accounting apache/arrow-rs#9433)
Move ValueIter into own module, and add public record_count function (Move ValueIter into own module, and add public record_count function apache/arrow-rs#9557)
[Variant] clean up variant_get tests ([Variant] clean up variant_get tests apache/arrow-rs#9518)
Optimize take_fixed_size_binary For Predefined Value Lengths (Optimize take_fixed_size_binary For Predefined Value Lengths apache/arrow-rs#9535)
arrow-flight: generate dict_ids for dicts nested inside complex types (arrow-flight: generate dict_ids for dicts nested inside complex types apache/arrow-rs#9556)
fix: Used checked_add for bounds checks to avoid UB (fix: Used checked_add for bounds checks to avoid UB apache/arrow-rs#9568)
[Variant] Add variant_to_arrow Struct type support ([Variant] Add variant_to_arrow Struct type support apache/arrow-rs#9572)
pyarrow: Cache the imported classes to avoid importing them each time (pyarrow: Cache the imported classes to avoid importing them each time apache/arrow-rs#9439)
perf: Coalesce page fetches when RowSelection selects all rows (perf: Coalesce page fetches when RowSelection selects all rows apache/arrow-rs#9578)
feat: Optimize from_bitwise_binary_op with 64-bit alignment (feat: Optimize from_bitwise_binary_op with 64-bit alignment apache/arrow-rs#9441)
Make Sbbf Constructers Public (Make Sbbf Constructers Public apache/arrow-rs#9569)
feat(parquet): add content defined chunking for arrow writer (feat(parquet): add content defined chunking for arrow writer apache/arrow-rs#9450)
[Variant] Add unshred_variant support for Binary and LargeBinary types ([Variant] Add unshred_variant support for Binary and LargeBinary types apache/arrow-rs#9576)
Prepare for 58.1.0 Release (Prepare for 58.1.0 Release apache/arrow-rs#9573)
Pre-reserve output capacity in ByteView/ByteArray dictionary decoding (Pre-reserve output capacity in ByteView/ByteArray dictionary decoding apache/arrow-rs#9590)
Add quoted_strings to FormatOptions (Add quoted_strings to FormatOptions apache/arrow-rs#9221)
Reduce per-byte overhead in VLQ integer decoding (Reduce per-byte overhead in VLQ integer decoding apache/arrow-rs#9584)
deps: fix object_store breakage for 0.13.2 (deps: fix object_store breakage for 0.13.2 apache/arrow-rs#9612)
chore(deps): update sha2 requirement from 0.10 to 0.11 (chore(deps): update sha2 requirement from 0.10 to 0.11 apache/arrow-rs#9618)
Support ListView codec in arrow-json (Support ListView codec in arrow-json apache/arrow-rs#9503)
fix: Stop using https://dist.apache.org/repos/dist/dev/arrow/KEYS for verification (fix: Stop using https://dist.apache.org/repos/dist/dev/arrow/KEYS for verification apache/arrow-rs#9604)
[Variant] Add unshredded Struct fast-path for variant_get(..., Struct) ([Variant] Add unshredded Struct fast-path for variant_get(..., Struct) apache/arrow-rs#9597)
Fix extend_nulls panic for UnionArray (Fix extend_nulls panic for UnionArray apache/arrow-rs#9607)
[Json] Add json reader benchmarks for Map and REE ([Json] Add json reader benchmarks for Map and REE apache/arrow-rs#9616)
Fix MutableBuffer::clear (Fix MutableBuffer::clear apache/arrow-rs#9622)
feat(parquet): derive PartialEq and Eq for CdcOptions (feat(parquet): derive PartialEq and Eq for CdcOptions apache/arrow-rs#9602)
[Variant] extend shredded null handling for arrays ([Variant] extend shredded null handling for arrays apache/arrow-rs#9599)
pyarrow: Small code simplifications (pyarrow: Small code simplifications apache/arrow-rs#9594)
fix(parquet): fix CDC panic on nested ListArrays with null entries ([Parquet] ArrowWriter with CDC panics on nested ListArrays apache/arrow-rs#9637)

Which issue does this PR close?

Closes #NNN.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

# Which issue does this PR close? # Rationale for this change Use .cloned().map(Some) instead of .map(|b| Some(b.clone())) for better readability and idiomatic Rust style. # What changes are included in this PR? # Are these changes tested? # Are there any user-facing changes?

Fix missing link in Parquet README

…apache#9411) Implements a helper to replace the pattern of creating a `BooleanBuffer` from an unsliced validity bitmap and filtering by null count. Previously this was done with `BooleanBuffer::new(...)` plus `Some(NullBuffer::new(...)).filter(|n| n.null_count() > 0);` now it is a single call to` NullBuffer::try_from_unsliced(buffer, len)`, which returns `Some(NullBuffer)` when there are nulls and `None` when all values are valid. - Added `try_from_unsliced` in `arrow-buffer/src/buffer/null.rs` with tests for nulls, all valid, all null, empty - Refactor `FixedSizeBinaryArray::try_from_iter_with_size` and `try_from_sparse_iter_with_size` to use it - Refactor `take_nulls` in `arrow-select` to use it Closes apache#9385

# Which issue does this PR close? - Closes apache#9455. # Rationale for this change check issue # What changes are included in this PR? Added list types support to `VariantArray` data type checking # Are these changes tested? # Are there any user-facing changes?

…taType (apache#9473) This enables setting a timezone or precision & scale on parameterized DataType values. Note: I think the panic is unfortunate, and a try_with_data_type() would be sensible. # Which issue does this PR close? - Closes apache#8042. # Are these changes tested? Yes # Are there any user-facing changes? - Adds `PrimitiveRunBuilder::with_data_type`.

# Which issue does this PR close? - Closes apache#9336 # Rationale for this change When an Avro reader schema has a union type that needs to be resolved against the type in the writer schema, resolution information other than primitive type promotions is not properly handled when creating the decoder. For example, when the reader schema has a nullable record field that has an added nested field on top of the fields defined in the writer schema, the record type resolution needs to be applied, using a projection with the default field value. # What changes are included in this PR? Extend the union resolution information in the decoder with variant data for enum remapping and record projection. The `Projector` data structure with `Skipper` decoders makes part of this information, which necessitated some refactoring. # Are these changes tested? TODO: - [x] Debug failing tests including a busy-loop failure mode. - [ ] Add more unit tests exercising the complex resolutions. # Are there any user-facing changes? No.

- part of apache#8466 Update release schedule based on historical reality

@Jefffrey

…ts (apache#9472) # Rationale for this change The motivation for this PR is to create to improve the testing infrastructure as a precursor to the following PR: - apache#9221 @Jefffrey seemed to be in favor of using `insta` for more tests: apache#9221 (comment) # What changes are included in this PR? This PR does not do logic changes, but is a straightforward translation of the current tests. More test cases, especially around escape sequences can be added in follow up PRs. # Are these changes tested? Yes, to review we still need to manually confirm that no test cases changed accidentally. # Are there any user-facing changes? No.

…ime (apache#9491) Mark ArrowTimestampType::make_value as deprecated and migrate internal callers to the newer from_naive_datetime API. # Which issue does this PR close? - Closes apache#9490 . # Rationale for this change Follow-up from PR apache#9345. # What changes are included in this PR? Mark ArrowTimestampType::make_value as deprecated and migrate internal callers to the newer from_naive_datetime API. # Are these changes tested? YES. # Are there any user-facing changes? Migration Path: Users should replace: ```rust // Old TimestampSecondType::make_value(naive) ``` With: ```rust // New TimestampSecondType::from_naive_datetime(naive, None) ```

…pache#9498) # Which issue does this PR close? - Closes apache#9492 . # What changes are included in this PR? See title. # Are these changes tested? YES # Are there any user-facing changes? NO

# Which issue does this PR close? - Closes apache#9478 . # What changes are included in this PR? - Fix the typo - Enhance the bracket access for the variant path # Are these changes tested? - Add some tests to cover the logic # Are there any user-facing changes? No

… parquet unshredding (apache#9437) # Which issue does this PR close? - Part of apache#9340. # Rationale for this change Json writers for ListLike types (List/ListView/FixedSizeList) are pretty similar apart from the element range representation. We already had a good way to abstract this kind of encoder in parquet variant unshredding. Given this, it would be good to move this `ListLikeArray` trait to arrow-array to be shared with json/parquet # What changes are included in this PR? Move `ListLikeArray` trait from parquet-variant-compute to arrow-array # Are these changes tested? Covered by existing tests # Are there any user-facing changes? New pub trait in arrow-array

# Which issue does this PR close? - Closes apache#9425. # Rationale for this change I noticed that this method is available on PrimitiveTypeBuilder, but missing on the GenericByteBuilder, which make sense since the gain is less, but after benchmarking, it shows a solid 10%. Mostly because the more efficient allocation of the null-mask. ``` ┌───────────────────┬────────────────┬───────────────────┬─────────┐ │ Benchmark │ append_value_n │ append_value loop │ Speedup │ ├───────────────────┼────────────────┼───────────────────┼─────────┤ │ n=100/len=5 │ 371 ns │ 408 ns │ 10% │ ├───────────────────┼────────────────┼───────────────────┼─────────┤ │ n=100/len=30 │ 456 ns │ 507 ns │ 10% │ ├───────────────────┼────────────────┼───────────────────┼─────────┤ │ n=100/len=1024 │ 1.81 µs │ 1.95 µs │ 8% │ ├───────────────────┼────────────────┼───────────────────┼─────────┤ │ n=1000/len=5 │ 2.39 µs │ 2.87 µs │ 17% │ ├───────────────────┼────────────────┼───────────────────┼─────────┤ │ n=1000/len=30 │ 3.41 µs │ 3.89 µs │ 12% │ ├───────────────────┼────────────────┼───────────────────┼─────────┤ │ n=1000/len=1024 │ 12.3 µs │ 14.4 µs │ 15% │ ├───────────────────┼────────────────┼───────────────────┼─────────┤ │ n=10000/len=5 │ 23.8 µs │ 29.3 µs │ 19% │ ├───────────────────┼────────────────┼───────────────────┼─────────┤ │ n=10000/len=30 │ 33.7 µs │ 39.0 µs │ 14% │ ├───────────────────┼────────────────┼───────────────────┼─────────┤ │ n=10000/len=1024 │ 115.9 µs │ 135.0 µs │ 14% │ ├───────────────────┼────────────────┼───────────────────┼─────────┤ │ n=100000/len=5 │ 227.5 µs │ 278.6 µs │ 18% │ ├───────────────────┼────────────────┼───────────────────┼─────────┤ │ n=100000/len=30 │ 328.1 µs │ 377.9 µs │ 13% │ ├───────────────────┼────────────────┼───────────────────┼─────────┤ │ n=100000/len=1024 │ 1.16 ms │ 1.34 ms │ 14% │ └───────────────────┴────────────────┴───────────────────┴─────────┘ ``` I think this is still worthwhile to be added. Let me know what the community thinks! # What changes are included in this PR? A new public API. # Are these changes tested? Yes! # Are there any user-facing changes? A new public API.

# Which issue does this PR close? Closes apache#9431 # Rationale for this change It would be nice to add `append_nulls` to MapBuilder, similar to `append_nulls` on `GenericListBuilder`. Appending the nulls at once, instead of using a loop has some nice performance implications: ``` Benchmark results (1,000,000 nulls): ┌─────────────────────────┬─────────┐ │ Method │ Time │ ├─────────────────────────┼─────────┤ │ append(false) in a loop │ 2.36 ms │ ├─────────────────────────┼─────────┤ │ append_nulls(N) │ 50 µs │ └─────────────────────────┴─────────┘ ``` # What changes are included in this PR? A new public API. # Are these changes tested? With some fresh unit tests. # Are there any user-facing changes? A nice and convient new public API

# Which issue does this PR close? - Closes apache#9429 I'm doing some performance optimization, and noticed that we have a loop adding one value to the null mask at a time. Instead, I'd suggest adding `append_non_nulls` to do this at once. ``` append_non_nulls(n) vs append(true) in a loop (with bitmap allocated) ┌───────────┬───────────────────┬─────────────────────┬─────────┐ │ n │ append(true) loop │ append_non_nulls(n) │ speedup │ ├───────────┼───────────────────┼─────────────────────┼─────────┤ │ 100 │ 251 ns │ 73 ns │ ~3x │ ├───────────┼───────────────────┼─────────────────────┼─────────┤ │ 1,000 │ 2.0 µs │ 94 ns │ ~21x │ ├───────────┼───────────────────┼─────────────────────┼─────────┤ │ 10,000 │ 19.3 µs │ 119 ns │ ~162x │ ├───────────┼───────────────────┼─────────────────────┼─────────┤ │ 100,000 │ 191 µs │ 348 ns │ ~549x │ ├───────────┼───────────────────┼─────────────────────┼─────────┤ │ 1,000,000 │ 1.90 ms │ 3.5 µs │ ~543x │ └───────────┴───────────────────┴─────────────────────┴─────────┘ ``` # Rationale for this change It adds a new public API in favor of performance improvements. # What changes are included in this PR? A new public API # Are these changes tested? Yes, with new unit-tests. # Are there any user-facing changes? Just a new convient API.

Updates the requirements on [strum_macros](https://github.com/Peternator7/strum) to permit the latest version. <details> <summary>Changelog</summary> Sourced from <a href="https://github.com/Peternator7/strum/blob/master/CHANGELOG.md">strum_macros's changelog</a>. <blockquote> <h2>0.28.0</h2> <ul> <li> <a href="https://redirect.github.com/Peternator7/strum/pull/461">#461</a>: Allow any kind of passthrough attributes on <code>EnumDiscriminants</code>. <ul> <li>Previously only list-style attributes (e.g. <code>#[strum_discriminants(derive(...))]</code>) were supported. Now path-only (e.g. <code>#[strum_discriminants(non_exhaustive)]</code>) and name/value (e.g. <code>#[strum_discriminants(doc = "foo")]</code>) attributes are also supported.</li> </ul> </li> <li> <a href="https://redirect.github.com/Peternator7/strum/pull/462">#462</a>: Add missing <code>#[automatically_derived]</code> to generated impls not covered by <a href="https://redirect.github.com/Peternator7/strum/pull/444">#444</a>. </li> <li> <a href="https://redirect.github.com/Peternator7/strum/pull/466">#466</a>: Bump MSRV to 1.71, required to keep up with updated <code>syn</code> and <code>windows-sys</code> dependencies. This is a breaking change if you're on an old version of rust. </li> <li> <a href="https://redirect.github.com/Peternator7/strum/pull/469">#469</a>: Use absolute paths in generated proc macro code to avoid potential name conflicts. </li> <li> <a href="https://redirect.github.com/Peternator7/strum/pull/465">#465</a>: Upgrade <code>phf</code> dependency to v0.13. </li> <li> <a href="https://redirect.github.com/Peternator7/strum/pull/473">#473</a>: Fix <code>cargo fmt</code> / <code>clippy</code> issues and add GitHub Actions CI. </li> <li> <a href="https://redirect.github.com/Peternator7/strum/pull/477">#477</a>: <code>strum::ParseError</code> now implements <code>core::fmt::Display</code> instead <code>std::fmt::Display</code> to make it <code>#[no_std]</code> compatible. Note the <code>Error</code> trait wasn't available in core until <code>1.81</code> so <code>strum::ParseError</code> still only implements that in std. </li> <li> <a href="https://redirect.github.com/Peternator7/strum/pull/476">#476</a>: Breaking Change - <code>EnumString</code> now implements <code>From<&str></code> (infallible) instead of <code>TryFrom<&str></code> when the enum has a <code>#[strum(default)]</code> variant. This more accurately reflects that parsing cannot fail in that case. If you need the old <code>TryFrom</code> behavior, you can opt back in using <code>parse_error_ty</code> and <code>parse_error_fn</code>: <pre lang="rust"><code>#[derive(EnumString)] #[strum(parse_error_ty = strum::ParseError, parse_error_fn = make_error)] pub enum Color { Red, #[strum(default)] Other(String), } fn make_error(x: &str) -> strum::ParseError { strum::ParseError::VariantNotFound } </code></pre> </li> <li> <a href="https://redirect.github.com/Peternator7/strum/pull/431">#431</a>: Fix bug where <code>EnumString</code> ignored the <code>parse_err_ty</code> attribute when the enum had a <code>#[strum(default)]</code> variant. </li> <li> <a href="https://redirect.github.com/Peternator7/strum/pull/474">#474</a>: EnumDiscriminants will now copy <code>default</code> over from the original enum to the Discriminant enum. <pre lang="rust"><code>#[derive(Debug, Default, EnumDiscriminants)] #[strum_discriminants(derive(Default))] // <- Remove this in 0.28. enum MyEnum { #[default] // <- Will be the #[default] on the MyEnumDiscriminant #[strum_discriminants(default)] // <- Remove this in 0.28 Variant0, Variant1 { a: NonDefault }, } </code></pre> </li> </ul>  </blockquote> ... (truncated) </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/Peternator7/strum/commit/7376771128834d28bb9beba5c39846cba62e71ec"><code>7376771</code></a> Peternator7/0.28 (<a href="https://redirect.github.com/Peternator7/strum/issues/475">#475</a>)</li> <li><a href="https://github.com/Peternator7/strum/commit/26e63cd964a2e364331a5dd977d589bb9f649d8c"><code>26e63cd</code></a> Display exists in core (<a href="https://redirect.github.com/Peternator7/strum/issues/477">#477</a>)</li> <li><a href="https://github.com/Peternator7/strum/commit/9334c728eedaa8a992d1388a8f4564bbccad1934"><code>9334c72</code></a> Make TryFrom and FromStr infallible if there's a default (<a href="https://redirect.github.com/Peternator7/strum/issues/476">#476</a>)</li> <li><a href="https://github.com/Peternator7/strum/commit/0ccbbf823c16e827afc263182cd55e99e3b2a52e"><code>0ccbbf8</code></a> Honor parse_err_ty attribute when the enum has a default variant (<a href="https://redirect.github.com/Peternator7/strum/issues/431">#431</a>)</li> <li><a href="https://github.com/Peternator7/strum/commit/2c9e5a9259189ce8397f2f4967060240c6bafd74"><code>2c9e5a9</code></a> Automatically add Default implementation to EnumDiscriminant if it exists on ...</li> <li><a href="https://github.com/Peternator7/strum/commit/e241243e48359b8b811b8eaccdcfa1ae87138e0d"><code>e241243</code></a> Fix existing cargo fmt + clippy issues and add GH actions (<a href="https://redirect.github.com/Peternator7/strum/issues/473">#473</a>)</li> <li><a href="https://github.com/Peternator7/strum/commit/639b67fefd20eaead1c5d2ea794e9afe70a00312"><code>639b67f</code></a> feat: allow any kind of passthrough attributes on <code>EnumDiscriminants</code> (<a href="https://redirect.github.com/Peternator7/strum/issues/461">#461</a>)</li> <li><a href="https://github.com/Peternator7/strum/commit/0ea1e2d0fd1460e7492ea32e6b460394d9199ff8"><code>0ea1e2d</code></a> docs: Fix typo (<a href="https://redirect.github.com/Peternator7/strum/issues/463">#463</a>)</li> <li><a href="https://github.com/Peternator7/strum/commit/36c051b91086b37d531c63ccf5a49266832a846d"><code>36c051b</code></a> Upgrade <code>phf</code> to v0.13 (<a href="https://redirect.github.com/Peternator7/strum/issues/465">#465</a>)</li> <li><a href="https://github.com/Peternator7/strum/commit/9328b38617dc6f4a3bc5fdac03883d3fc766cf34"><code>9328b38</code></a> Use absolute paths in proc macro (<a href="https://redirect.github.com/Peternator7/strum/issues/469">#469</a>)</li> <li>Additional commits viewable in <a href="https://github.com/Peternator7/strum/compare/v0.27.0...v0.28.0">compare view</a></li> </ul> </details> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

) # Rationale for this change The inner loop in `Projector::project_record` gives the optimizer somewhat complicated dynamic data to branch through. The sparse arrays in `Projector` are redundantly coded: `None` in the index positions of `writer_to_reader` must match `Some` in `skip_decoders` and vice versa. # What changes are included in this PR? Refactor record projection state with a single array of directive-like enums corresponding to each writer schema field. # Are these changes tested? Added a benchmark for record projection (the benchmark code is partially shared with apache#9397). Somewhat counterintuitively for me, it does not show improvement on a more complex case with a mix of projected fields, but does improve the simpler one-field projection cases. Passes the existing tests.

1. Reduce some quantifiers from `*` to `?` when 2+ occurrences would generate invalid Rust code. `$(if $pred:expr)*` 2. Clean up 4-armed recursive macros: * put the base case first * explain the fixups * fix all at once, going directly to the base case, instead of possibly multiple hoops The inital motivation was getting rust-analyzer to stop choking on such macros usage where the left-hand side was a tuple and the right-hand-side an expr.

# Which issue does this PR close?  - Part of apache#9476. # Rationale for this change Add benchmarks to show benefit of the optimizations in apache#9477 # What changes are included in this PR? Adds some benches for DELTA_BINARY_PACKED, DELTA_BYTE_ARRAY, and DELTA_LENGTH_BYTE_ARRAY. The generated data is meant to show the benefit of special casing for miniblocks with a bitwidth of 0. # Are these changes tested? Just benches # Are there any user-facing changes? No

# Which issue does this PR close? - Closes apache#9506 # Rationale for this change `Vec::reserve(n)` does not guarantee exact capacity, Rust's `MIN_NON_ZERO_CAP` optimization means `reserve(2)` gives capacity = 4 for most numeric types, causing `debug_assert_eq!(capacity, batch_size)` to panic in debug mode when `batch_size < 4`. # What changes are included in this PR? Replace `reserve` with `reserve_exact` in `ensure_capacity` in both `InProgressPrimitiveArray` and `InProgressByteViewArray`. `reserve_exact` skips the amortized growth optimization and allocates exactly the requested capacity, making the assertion correct. # Are these changes tested? No. This only fixes an incorrect debug assertion. # Are there any user-facing changes? No

# Which issue does this PR close? - Closes apache#9513 # Rationale for this change `VariantArray::try_new` and `canonicalize_and_verify_data_type` both accept `LargeUtf8` as a valid shredded variant type. However unshred_variant currently only handles Utf8 for string typed_value columns This means a VariantArray with a LargeUtf8 typed_value column can be constructed successfully, but calling unshred_variant on it fails

# Which issue does this PR close? - Closes apache#9512 # Rationale for this change You can build a Variant with a StringView type shredded out, but calling `unshred_variant` will fail with not yet implemented

…page skip (apache#9374) # Which issue does this PR close? - Closes apache#9370 . # Rationale for this change The bug occurs when using RowSelection with nested types (like List<String>) when: 1. A column has multiple pages in a row group 2. The selected rows span across page boundaries 3. The first page is entirely consumed during skip operations The issue was in `arrow-rs/parquet/src/column/reader.rs:287-382` (`skip_records` function). **Root cause:** When `skip_records` completed successfully after crossing page boundaries, the `has_partial` state in the `RepetitionLevelDecoder` could incorrectly remain true. This happened when: - The skip operation exhausted a page where has_record_delimiter was false - The skip found the remaining records on the next page by counting a delimiter at index 0 - When a subsequent read_records(1) was called, the stale has_partial=true state caused count_records to incorrectly interpret the first repetition level (0) at index 0 as ending a "phantom" partial record, returning (1 record, 0 levels, 0 values) instead of properly reading the actual record data. For a more descriptive explanation, look here: apache#9370 (comment) # What changes are included in this PR? Added code at the end of skip_records to reset the partial record state when all requested records have been successfully skipped. This ensures that after skip_records completes, we're at a clean record boundary with no lingering partial record state, fixing the array length mismatch in StructArrayReader. # Are these changes tested? Commit apache@365bd9a introduces a test showcasing this issue with v2 data pages only on a unit-test level. PR apache#9399 could be used to showcase the issue in an end-to-end way. Previously wrong assumption that thought it had to do with mixing v1 and v2 data pages: ``` In b52e043 I added a test that I validated to fail whenever I remove my fix. Bug Mechanism The bug requires three ingredients: 1. Page 1 (DataPage v1): Contains a nested column (with rep levels). During skip_records, all levels on this page are consumed. count_records sees no following rep=0 delimiter, so it sets has_partial=true. Since has_record_delimiter is false (the default InMemoryPageReader returns false when more pages exist), flush_partial is not called. 2. Page 2 (DataPage v2): Has num_rows available in its metadata. When num_rows <= remaining_records, the entire page is skipped via skip_next_page() — this does not touch the rep level decoder at all, so has_partial remains stale true from page 1. 3. Page 3 (DataPage v1): When read_records loads this page, the stale has_partial=true causes the rep=0 at position 0 to be misinterpreted as completing a "phantom" partial record. This produces (1 record, 0 levels, 0 values) instead of reading the actual record data. Test Verification - With fix (flush_partial at end of skip_records): read_records(1) correctly returns (1, 2, 2) with values [70, 80] - Without fix: read_records(1) returns (1, 0, 0) — a phantom record with no data, which is what causes the "Not all children array length are the same!" error when different sibling columns in a struct produce different record counts ``` --------- Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

…e#9532) # Which issue does this PR close? - Closes #NNN. # Rationale for this change I would like to use `ParquetMetaDataPushDecoder` in arrow-datafusion, but the `with_file_decryption_properties` function is pub(crate), so I can't fully implement the encryption feature., # What changes are included in this PR? Make it pub # Are these changes tested? Not needed # Are there any user-facing changes? Now pub

…nion (apache#9524) # Which issue does this PR close? Field::try_merge correctly handles DataType::Null for primitive types and when self is Null, but fails when self is a compound type (Struct, List, LargeList, Union) and from is Null. This causes Schema::try_merge to error when merging schemas where one has a Null field and another has a concrete compound type for the same field. This is common in JSON inference where some files have null values for fields that are structs/lists in other files. - Closes[ apache#9523](apache#9523) # Rationale for this change Add `DataType::Null` arms to the Struct, List, LargeList, and Union branches in `Field::try_merge`, consistent with how primitive types already handle it. # What changes are included in this PR? Add `DataType::Null` arms to the Struct, List, LargeList, and Union branches in `Field::try_merge`, consistent with how primitive types already handle it. # Are these changes tested? - Added test `test_merge_compound_with_null` covering Struct, List, LargeList, and Union merging with Null in both directions. - Existing tests continue to pass. # Are there any user-facing changes? No

# Which issue does this PR close?  - Relates to apache#9497. # Rationale for this change  Add benchmark for `ListArray` in `json_reader` to support the performance evaluation of apache#9497 # What changes are included in this PR?  - Benchmarks for decoding and serialize json list to `ListArray`. - Benchmarks for `ListArray` and `FixedSizeListArray` json writer # Are these changes tested?  Benchmarks only # Are there any user-facing changes?  No

…apache#8918) # Which issue does this PR close? Part of apache#8137. Follow up of apache#7303. Replaces apache#8040. # Rationale for this change apache#7303 implements the fundamental symbols for tracking memory. This patch exposes those APIs to a higher level Array and ArrayData. # What changes are included in this PR? New `claim` API for NullBuffer, ArrayData, and Array. New `pool` feature-flag to arrow, arrow-array, and arrow-data. # Are these changes tested? Added a doctest on the `Array::claim` method. # Are there any user-facing changes? Added API and a new feature-flag for arrow, arrow-array, and arrow-data.

…dicates (apache#9509) # Which issue does this PR close? Raised an issue at apache#9516 for this one Same issue as apache#9239 but extended to another scenario # Rationale for this change When there are multiple predicates being evaluated, we need to reset the row selection policy before overriding the strategy. Scenario: - Dense initial RowSelection (alternating select/skip) covers all pages → Auto resolves to Mask - Predicate 1 evaluates on column A, narrows selection to skip middle pages - Predicate 2's column B is fetched sparsely with the narrowed selection (missing middle pages) - Without the fix, the override for predicate 2 returns early (policy=Mask, not Auto), so Mask is used and tries to read missing pages → "Invalid offset" error # What changes are included in this PR? This is a one line change to reset the selection policy in the `RowGroupDecoderState::WaitingOnFilterData` arm # Are these changes tested? Yes a new test added that fails currently on `main`, but as you can see it's a doozy to set up. # Are there any user-facing changes? Nope

…e#9481) # Which issue does this PR close? - Closes apache#9451 - Closes apache#6256 # Rationale for this change A reader might be annoyed (performance wart) if a parquet footer lacks nullcount stats, but inferring nullcount=0 for missing stats makes the stats untrustworthy and can lead to incorrect behavior. # What changes are included in this PR? If a parquet footer nullcount stat is missing, surface it as None, reserving `Some(0)` for known-no-null cases. # Are these changes tested? Fixed one unit test that broke, added a missing unit test that covers the other change site. # Are there any user-facing changes? The stats API doesn't change signature, but there is a behavior change. The existing doc that called out the incorrect behavior has been removed to reflect that the incorrect behavior no longer occurs.

# Rationale for this change Exposes the Arrow schema produced by the async Avro file reader, similarly to the `schema` method on the synchronous reader. This allows an application to prepare casting or other schema transformations with no need to fetch the first record batch to learn the produced Arrow schema. Since the async reader only parses OCF content for the moment, the schema does not change from batch to batch. # What changes are included in this PR? The `schema` method for `AsyncAvroFileReader` exposes the Arrow schema of record batches that are produced by the reader. # Are these changes tested? Added tests verifying that the returned schema matches the expected. # Are there any user-facing changes? Added a `schema` method to `AsyncAvroFileReader`.

# Which issue does this PR close? - Closes apache#9543. # Rationale for this change See issue, but it is possible to construct arguments to `arrow_buffer::bit_util::bit_mask::set_bits` that overflow the bounds checking protecting unsafe code. # What changes are included in this PR? Use `checked_add` when doing the bounds checking and panic when an overflow occurs. # Are these changes tested? Yes # Are there any user-facing changes? No

# Which issue does this PR close?  - Closes apache#9529 . # Rationale for this change - In a follow up PR, can fix the `variant_get` TODO: https://github.com/apache/arrow-rs/blob/3b6179658203dc1b1610b67c1777d5b8beb137fc/parquet-variant-compute/src/variant_get.rs#L89-L92 - When we know that Struct VariantArray is not shredded can reuse `shred_basic_variant`  # What changes are included in this PR? - Added `StructVariantToArrowRowBuilder` builder. - Moved `make_variant_to_arrow_row_builder` logic to `make_typed_variant_to_arrow_row_builder` to reuse by `Struct` array's inner fields. - Changed a `variant_get` test to show that it now handles unshredded `Struct` `VariantArray`  # Are these changes tested? - Yes, added `test_struct_row_builder_handles_unshredded_nested_structs` - Everything else still works.  # Are there any user-facing changes? No  --------- Co-authored-by: Ryan Johnson <scovich@users.noreply.github.com>

…apache#9439) # Which issue does this PR close? - Closes apache#9438. # Rationale for this change Speed up conversion by only importing `pyarrow` once. # What changes are included in this PR? - Use `PyOnceLock::import` to import the types. - Remove some not useful `.extract::<PyBackedStr>()?` (the `Display` implementation already does something similar) # Are these changes tested? Covered by existing tests. It would be nice to add benchmark but it might require to: - either add a dependency to a python benchmark runner - write some hacky code to import `pyarrow` from criterion tests (likely by running `pip`/`uv` from the Rust benchmark code) # Are there any user-facing changes? No

…e#9578) ## Summary - When a `RowSelection` selects every row in a row group, `fetch_ranges` now treats it as no selection, producing a single whole-column-chunk I/O request instead of N individual page requests - This reduces the number of I/O requests for subsequent filter predicates when an earlier predicate passes all rows ## Details In `InMemoryRowGroup::fetch_ranges`, when both a `RowSelection` and an `OffsetIndex` are present, the code enters a page-level fetch path that uses `scan_ranges()` to produce individual page ranges. Even when the selection covers all rows, this produces N separate ranges (one per page). The fix: before entering the page-level path, check if the selection's `row_count()` equals the row group's total row count. If so, drop the selection and take the simpler whole-column-chunk path. This commonly happens when a multi-predicate `RowFilter` has an early predicate that passes all rows in a row group (e.g., `CounterID = 62` on a row group where all rows have `CounterID = 62`). ## Test plan - [x] Existing tests pass (snapshot updated to reflect fewer I/O requests) - [x] `test_read_multiple_row_filter` verifies the coalesced fetch pattern 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

) # Which issue does this PR close? - Closes apache#9378 # Rationale for this change the optimizations as listed in the issue description - Align to 8 bytes - Don't try to return a buffer with bit_offset 0 but round it to a multiple of 64 - Use chunk_exact for the fallback path # What changes are included in this PR? When both inputs share the same sub-64-bit alignment (left_offset % 64 == right_offset % 64), the optimized path is used. This covers the common cases (both offset 0, both sliced equally, etc.). The BitChunks fallback is retained only when the two offsets have different sub-64-bit alignment. # Are these changes tested? Yes the tests are changed and they are included # Are there any user-facing changes? Yes, this is a minor breaking change to from_bitwise_binary_op: - The returned BooleanBuffer may now have a non-zero offset (previously always 0) - The returned BooleanBuffer may have padding bits set outside the logical range in values() --------- Signed-off-by: Kunal Singh Dadhwal <kunalsinghdadhwal@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

# Which issue does this PR close? None # Rationale for this change We want to use the SBBF Bloom Filter, but need to construct/serialize it manually. Currently there is no way to create a new `Sbbf` outside of this crate. Alongside this: we want to store the `Sbbf` in a `FixedSizedBinary` column for some fancy indexing. # What changes are included in this PR? Some methods become public # Are these changes tested? N/A # Are there any user-facing changes? Yes, we add a few more public methods to the `Sbbf` struct

…9450) # Which issue does this PR close? - Closes #NNN. # Rationale for this change Rust implementation of apache/arrow#45360 Traditional Parquet writing splits data pages at fixed sizes, so a single inserted or deleted row causes all subsequent pages to shift — resulting in nearly every byte being re-uploaded to content-addressable storage (CAS) systems. CDC determines page boundaries via a rolling gearhash over column values, so unchanged data produces identical pages across different writes enabling storage cost reductions and faster upload times. See more details in https://huggingface.co/blog/parquet-cdc The original C++ implementation apache/arrow#45360 Evaluation tool https://github.com/huggingface/dataset-dedupe-estimator where I already integrated this PR to verify that deduplication effectiveness is on par with parquet-cpp (lower is better): <img width="984" height="411" alt="image" src="https://github.com/user-attachments/assets/e6e80931-ac76-4bdd-bf9c-ba7e06559411" /> # What changes are included in this PR? - **Content-defined chunker** at `parquet/src/column/chunker/` - **Arrow writer integration** integrated in `ArrowColumnWriter` - **Writer properties** via `CdcOptions` struct (`min_chunk_size`, `max_chunk_size`, `norm_level`) - **ColumnDescriptor**: added `repeated_ancestor_def_level` field to for nested field values iteration # Are these changes tested? Yes — unit tests are located in `cdc.rs` and ported from the C++ implementation. # Are there any user-facing changes? New **experimental** API, disabled by default — no behavior change for existing code: ```rust // Simple toggle (256 KiB min, 1 MiB max, norm_level 0) let props = WriterProperties::builder() .set_content_defined_chunking(true) .build(); // Excpliti CDC parameters let props = WriterProperties::builder() .set_cdc_options(CdcOptions { min_chunk_size: 128 * 1024, max_chunk_size: 512 * 1024, norm_level: 1 }) .build(); ``` --------- Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>

apache#9576) # Which issue does this PR close?  - Closes apache#9526 # Rationale for this change  `shred_variant` already supports Binary and LargeBinary types (apache#9525, apache#9554), but unshred_variant does not handle these types. This means shredded Binary/LargeBinary columns cannot be converted back to unshredded VariantArrays. # What changes are included in this PR? Adds unshred_variant support for DataType::Binary and DataType::LargeBinary in parquet-variant-compute/src/unshred_variant.rs: - New enum variants PrimitiveBinary and PrimitiveLargeBinary - Match arms in append_row and try_new_opt - AppendToVariantBuilder impls for BinaryArray and LargeBinaryArray # Are these changes tested? Yes # Are there any user-facing changes? No breaking changes --------- Signed-off-by: Kunal Singh Dadhwal <kunalsinghdadhwal@gmail.com>

# Which issue does this PR close? - part of apache#9108 # Rationale for this change Prepare for next release # What changes are included in this PR? 1. Update version to `58.1.0` 2. Add changelog. See rendered preview here: https://github.com/alamb/arrow-rs/blob/alamb/prepare_58.1.0/CHANGELOG.md # Are these changes tested? By CI # Are there any user-facing changes? Yes

…apache#9590) ## Summary - Reserve `output.views` capacity in `ByteViewArrayDecoderDictionary::read` before the decode loop - Reserve `output.offsets` capacity in `ByteArrayDecoderDictionary::read` before the decode loop This avoids per-chunk reallocation during `extend` calls inside the dictionary decode loop. Closes apache#9587 ## Test plan - [ ] Existing tests pass (no functional change, only pre-allocation) - [ ] Benchmark dictionary-encoded StringView/BinaryView/String reads 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

# Rationale for this change In some cases, it is desirable to print strings with surrounding quotation marks. A typical example that we run into in https://github.com/rerun-io/rerun is a `StructArray` that contains empty strings: Current formatting: ```text {name: } ``` Added option in this PR: ```text {name: ""} ``` # What changes are included in this PR? This PR relies on `std::fmt::Debug` to do the actual formatting of strings, which means that all escaping is handled out of the box. # Are these changes tested? This PR contains test for different types of inputs, including escape sequences. Additionally, it also tests the `StructArray` example outlined above. # Are there any user-facing changes? By default this option is false, making the feature opt-in. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

## Which issue does this PR close? Closes apache#9580 ## Rationale The current VLQ decoder calls `get_aligned` for each byte, which involves repeated offset calculations and bounds checks in the hot loop. ## What changes are included in this PR? Align to the byte boundary once, then iterate directly over the buffer slice, avoiding per-byte overhead from `get_aligned`. ## Are there any user-facing changes? No. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

# Rationale for this change The `object_store` crate release 0.13.2 breaks the build of parquet because it feature-gates the `buffered` module. I have filed apache/arrow-rs-object-store#677 about the breakage; meanwhile this fix is made in expectation that 0.13.2 will not be yanked and the feature gate will remain. # What changes are included in this PR? Bump the version to 0.13.2 and requesting the "tokio" feature. # Are these changes tested? The build should succeed in CI workflows. # Are there any user-facing changes? No Co-authored-by: Mikhail Zabaluev <mikhail.zabaluev@gmail.com>

Updates the requirements on [sha2](https://github.com/RustCrypto/hashes) to permit the latest version. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/RustCrypto/hashes/commit/ffe093984c004769747e998f77da8ff7c0e7a765"><code>ffe0939</code></a> Release sha2 0.11.0 (<a href="https://redirect.github.com/RustCrypto/hashes/issues/806">#806</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/8991b65fe400c31c4cc189510f86ae642c470cd9"><code>8991b65</code></a> Use the standard order of the <code>[package]</code> section fields (<a href="https://redirect.github.com/RustCrypto/hashes/issues/807">#807</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/3d2bc57db40fd6aeb25d6c6da98d67e2784c2985"><code>3d2bc57</code></a> sha2: refactor backends (<a href="https://redirect.github.com/RustCrypto/hashes/issues/802">#802</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/faa55fb83697c8f3113636d88070e5f5edc8c335"><code>faa55fb</code></a> sha3: bump <code>keccak</code> to v0.2 (<a href="https://redirect.github.com/RustCrypto/hashes/issues/803">#803</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/d3e6489e56f8486d4a93ceb7a8abf4924af1de7b"><code>d3e6489</code></a> sha3 v0.11.0-rc.9 (<a href="https://redirect.github.com/RustCrypto/hashes/issues/801">#801</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/bbf6f51ff97f81ab15e6e5f6cf878bfbcb1f47c8"><code>bbf6f51</code></a> sha2: tweak backend docs (<a href="https://redirect.github.com/RustCrypto/hashes/issues/800">#800</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/155dbbf2959dbec0ec75948a82590ddaede2d3bc"><code>155dbbf</code></a> sha3: add default value for the <code>DS</code> generic parameter on <code>TurboShake128/256</code>...</li> <li><a href="https://github.com/RustCrypto/hashes/commit/ed514f2b34526683b3b7c41670f1887982c3df64"><code>ed514f2</code></a> Use published version of <code>keccak</code> v0.2 (<a href="https://redirect.github.com/RustCrypto/hashes/issues/799">#799</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/702bcd83735a49c928c0fc24506924f5c0aa22af"><code>702bcd8</code></a> Migrate to closure-based <code>keccak</code> (<a href="https://redirect.github.com/RustCrypto/hashes/issues/796">#796</a>)</li> <li><a href="https://github.com/RustCrypto/hashes/commit/827c043f82d57666a0b146d156e91c39535c1305"><code>827c043</code></a> sha3 v0.11.0-rc.8 (<a href="https://redirect.github.com/RustCrypto/hashes/issues/794">#794</a>)</li> <li>Additional commits viewable in <a href="https://github.com/RustCrypto/hashes/compare/groestl-v0.10.0...sha2-v0.11.0">compare view</a></li> </ul> </details> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

# Which issue does this PR close?  - Closes apache#9340. # Rationale for this change  # What changes are included in this PR?  Support `ListView` codec in arrow-json. Using `ListLikeArray` trait to simplify implementation. # Are these changes tested?  Tests added # Are there any user-facing changes?  New encoder/decoder

… verification (apache#9604) # Which issue does this PR close? - Closes apache#9603 # Rationale for this change The release and dev KEYS files could get out of synch. We should use the release/ version: - Users use the release/ version not dev/ version when they verify our artifacts' signature - https://dist.apache.org/ may reject our request when we request many times by CI # What changes are included in this PR? Use `https://www.apache.org/dyn/closer.lua?action=download&filename=arrow/KEYS` to download the KEYS file and the expected `https://dist.apache.org/repos/dist/dev/arrow` for the RC artifacts. # Are these changes tested? Yes, I've verified 58.1.0 1 both previous to the change and after the change. # Are there any user-facing changes? No

…uct)` (apache#9597) # Which issue does this PR close?  - Closes apache#9596. # Rationale for this change Check issue  # What changes are included in this PR? Reuse `shred_basic_variant` as a fast path for unshredded `Struct` handling in `variant_get(..., Struct)`  # Are these changes tested? Yes, added two unit tests to establish safe mode behavior.  # Are there any user-facing changes?

## Summary - Fix `MutableArrayData::extend_nulls` which previously panicked unconditionally for both sparse and dense Union arrays - For sparse unions: append the first type_id and extend nulls in all children - For dense unions: append the first type_id, compute offsets into the first child, and extend nulls in that child only ## Background This bug was discovered via DataFusion. `CaseExpr` uses `MutableArrayData` via `scatter()` to build result arrays. When a `CASE` expression returns a Union type (e.g., from `json_get` which returns a JSON union) and there are rows where no `WHEN` branch matches (implicit `ELSE NULL`), `scatter` calls `extend_nulls` which panics with "cannot call extend_nulls on UnionArray as cannot infer type". Any query like: ```sql SELECT CASE WHEN condition THEN returns_union(col, 'key') END FROM table ``` would panic if `condition` is false for any row. ## Root Cause The `extend_nulls` implementation for Union arrays unconditionally panicked because it claimed it "cannot infer type". However, the Union's field definitions (child types and type IDs) are available in the `MutableArrayData`'s data type — there's enough information to produce valid null entries by picking the first declared type_id. ## Test plan - [x] Added test for sparse union `extend_nulls` - [x] Added test for dense union `extend_nulls` - [x] Existing `test_union_dense` continues to pass - [x] All `array_transform` tests pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>

# Which issue does this PR close?  - Relates to apache#9497. # Rationale for this change  # What changes are included in this PR?  As part of the effort to move the Json reader away from `ArrayData` toward typed `ArrayRef` APIs, it's necessary to change the `ArrayDecoder::decode` interface to return `ArrayRef` directly and updates all decoder implementations (list, struct, map, run-end encoded) to construct typed arrays without intermediate `ArrayData` round-trips. New benchmarks for map and run-end encoded decoding are added to verify there is no performance regression. # Are these changes tested?  Yes # Are there any user-facing changes?  No

# Which issue does this PR close? - closes apache#9593 # Rationale for this change In a previous PR (apache#9593), I change instances of `truncate(0)` to `clear()`. However, this breaks the test `test_truncate_with_pool` at `arrow-buffer/src/buffer/mutable.rs:1357`, due to an inconsistency between the implementation of `truncate` and `clear`. This PR fixes that test. # What changes are included in this PR? This PR copies a section of code related to the `pool` feature present in `truncate` but absent in `clear`, fixing the failing unit test. # Are these changes tested? Yes. # Are there any user-facing changes? No.

) # Rationale for this change CdcOptions only contains primitive fields (usize, usize, i32) so deriving PartialEq and Eq is straightforward. This is needed by downstream crates such as DataFusion that embed CdcOptions in their own configuration structs and need to compare them. # What changes are included in this PR? Implemented PartialEq and Eq for CdcOptions. # Are these changes tested? Added an equality test. # Are there any user-facing changes? No.

# Which issue does this PR close?  - Closes apache#8400. # Rationale for this change Check issue  # What changes are included in this PR? - Added `AppendNullMode` enum supporting all semantics. - Replaced the bool logic to the new enum - Fix test outputs for List Array cases  # Are these changes tested? - Added unit tests  # Are there any user-facing changes?

# Rationale for this change Makes the code simpler and more readable by relying on new PyO3 and Rust features. No behavior should have changed outside of an error message if `__arrow_c_array__` does not return a tuple # What changes are included in this PR? - use `.call_method0(M)?` instead of `.getattr(M)?.call0()` - Use `.extract()` that allows more advanced features like directly extracting tuple elements - remove temporary variables just before returning - use &raw const and &raw mut pointers instead of casting and addr_of!

…pache#9637) The CDC chunker's value_offset diverged from actual leaf array positions when null list entries had non-empty child offset ranges (valid per the Arrow columnar format spec). This caused slice_for_chunk to produce incorrect non_null_indices, leading to an out-of-bounds panic in write_mini_batch. Track non-null value counts (nni) separately from leaf slot counts in the chunker, and use them in slice_for_chunk to correctly index into non_null_indices regardless of gaps in the leaf array.

SYaoJun and others added 30 commits February 23, 2026 11:47

docs(parquet): Fix broken links in README (apache#9467)

ff736e0

Fix missing link in Parquet README

Update planned release schedule in README.md (apache#9466)

a20753c

- part of apache#8466 Update release schedule based on historical reality

chore: remove duplicate macro partially_shredded_variant_array_gen (a…

4d8e8ba

…pache#9498) # Which issue does this PR close? - Closes apache#9492 . # What changes are included in this PR? See title. # Are these changes tested? YES # Are there any user-facing changes? NO

support string view unshred variant (apache#9514)

0b04483

# Which issue does this PR close? - Closes apache#9512 # Rationale for this change You can build a Variant with a StringView type shredded out, but calling `unshred_variant` will fail with not yet implemented

etseidl and others added 24 commits March 19, 2026 13:39

github-actions bot added parquet arrow arrow-avro arrow-flight parquet-variant labels Apr 1, 2026

kszucs closed this Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(parquet): fix CDC panic on nested ListArrays with null entries #2

fix(parquet): fix CDC panic on nested ListArrays with null entries #2
kszucs wants to merge 80 commits intomainfrom
cdc-nested-null

kszucs commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

kszucs commented Apr 1, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants