Commit 44dae05
committed
feat(parquet): skip level materialization for entirely-null columns
When every element in a list, struct, or fixed-size list array is null,
short-circuit level building before the row loop and store a compact
`(def_value, rep_value, count)` tuple on `ArrayLevels` instead of
materializing `Vec<i16>` buffers. The same fast path applies at the leaf
level in `write_levels()` when `logical_nulls` covers every row.
On the write side, `ArrowColumnWriter` detects the `uniform_levels`
tuple and calls a dedicated `write_uniform_null_batch()` that encodes
def/rep levels via `RleEncoder::put_n()` in O(1) amortized time,
bypassing the normal mini-batch chunking and per-element iteration. A
new `LevelEncoder::put_n_with_observer()` fuses encoding with histogram
and counting updates in a single call. `write_uniform_null_batch` chunks
at the configured page row count limit to respect page boundaries.
Also defers `non_null_indices.reserve()` to branches that actually
populate it, avoiding an unnecessary allocation for all-null arrays.
Benchmark results (vs previous commit):
primitive_all_null/default 192 µs (was 8.8 ms, −97.8%)
primitive_all_null/parquet_2 193 µs (was 8.8 ms, −97.8%)
primitive_all_null/zstd_parquet_2 250 µs (was 8.9 ms, −97.2%)
Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>1 parent f141f4a commit 44dae05
0 commit comments