Commit 757f11c
committed
parquet: batch consecutive null/empty rows in write_list
Restructure `write_list()` to accumulate consecutive null and empty rows
and flush them in a single `visit_leaves()` call using
`extend(repeat_n(...))`, instead of calling `visit_leaves()` per row.
With sparse data (99% nulls), a 4096-row batch previously triggered
~4000 individual tree traversals, each pushing a single value per leaf.
Now consecutive null/empty runs are collapsed into one traversal that
extends all leaf level buffers in bulk.
This follows the same pattern already used by `write_struct()`. The
`write_non_null_slice` path is unchanged since each non-null row has
different offsets and cannot be batched.
Benchmark results (list_primitive_sparse_99pct_null, 65536 rows, 99% nulls):
benchmark baseline optimized change
sparse_list/default 173.01 µs 147.14 µs -15%
sparse_list/parquet_2 178.83 µs 147.19 µs -18%
sparse_list/bloom_filter 241.27 µs 203.94 µs -15%
sparse_list/zstd 196.23 µs 168.50 µs -14%
sparse_list/zstd_parquet_2 155.58 µs 155.58 µs ~0% (baseline noisy)
No measurable change on list benchmarks with 25% nulls.
Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>1 parent 9d0e8be commit 757f11c
File tree
2 files changed
+53
-19
lines changed- parquet
- benches
- src/arrow/arrow_writer
2 files changed
+53
-19
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
411 | 411 | | |
412 | 412 | | |
413 | 413 | | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
414 | 418 | | |
415 | 419 | | |
416 | 420 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
335 | 335 | | |
336 | 336 | | |
337 | 337 | | |
338 | | - | |
339 | | - | |
340 | | - | |
341 | | - | |
342 | | - | |
343 | | - | |
344 | | - | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
345 | 351 | | |
346 | 352 | | |
347 | | - | |
348 | | - | |
349 | | - | |
350 | | - | |
351 | | - | |
352 | | - | |
353 | | - | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
354 | 366 | | |
355 | 367 | | |
356 | 368 | | |
357 | 369 | | |
358 | 370 | | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
359 | 374 | | |
360 | 375 | | |
361 | 376 | | |
362 | 377 | | |
363 | 378 | | |
| 379 | + | |
364 | 380 | | |
365 | | - | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
366 | 384 | | |
367 | | - | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
368 | 388 | | |
369 | | - | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
370 | 394 | | |
371 | 395 | | |
| 396 | + | |
| 397 | + | |
372 | 398 | | |
373 | 399 | | |
| 400 | + | |
374 | 401 | | |
375 | 402 | | |
376 | 403 | | |
377 | 404 | | |
378 | | - | |
| 405 | + | |
379 | 406 | | |
380 | | - | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
381 | 410 | | |
382 | 411 | | |
| 412 | + | |
383 | 413 | | |
384 | 414 | | |
385 | 415 | | |
| |||
0 commit comments