Skip to content

Commit 44dae05

Browse files
committed
feat(parquet): skip level materialization for entirely-null columns
When every element in a list, struct, or fixed-size list array is null, short-circuit level building before the row loop and store a compact `(def_value, rep_value, count)` tuple on `ArrayLevels` instead of materializing `Vec<i16>` buffers. The same fast path applies at the leaf level in `write_levels()` when `logical_nulls` covers every row. On the write side, `ArrowColumnWriter` detects the `uniform_levels` tuple and calls a dedicated `write_uniform_null_batch()` that encodes def/rep levels via `RleEncoder::put_n()` in O(1) amortized time, bypassing the normal mini-batch chunking and per-element iteration. A new `LevelEncoder::put_n_with_observer()` fuses encoding with histogram and counting updates in a single call. `write_uniform_null_batch` chunks at the configured page row count limit to respect page boundaries. Also defers `non_null_indices.reserve()` to branches that actually populate it, avoiding an unnecessary allocation for all-null arrays. Benchmark results (vs previous commit): primitive_all_null/default 192 µs (was 8.8 ms, −97.8%) primitive_all_null/parquet_2 193 µs (was 8.8 ms, −97.8%) primitive_all_null/zstd_parquet_2 250 µs (was 8.9 ms, −97.2%) Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
1 parent f141f4a commit 44dae05

File tree

5 files changed

+229
-1
lines changed

5 files changed

+229
-1
lines changed

0 commit comments

Comments
 (0)