Commit 504035a
committed
feat(parquet): make
The `PushDecoder` (introduced in #7997, #8080) is designed to decouple
IO and CPU. It holds non-contiguous byte ranges, with a
`NeedsData`/`push_range` protocol. However, it requires each logical
read to be satisfied in full by a single physical buffer: `has_range`,
`get_bytes`, and `Read::read` all searched for one buffer that entirely
covered the requested range.
This assumption conflates two orthogonal IO strategies:
- Coalescing: the IO layer merges adjacent requested ranges into fewer,
larger fetches.
- Prefetching: the IO layer pushes data ahead of what the decoder has
requested. This is an inversion of control: the IO layer speculatively
fills buffers at offsets not yet requested and for arbitrary buffer
sizes.
These two strategies interact poorly with the current release mechanism
(`clear_ranges`), which matches buffers by exact range equality:
- Coalescing is both rewarded and punished. It is load bearing because
without it, the number of physical buffers scale with ranges
requested, and `clear_ranges` performs an O(N×M) scan to remove
consumed ranges, producing quadratic overhead on wide schemas.
But it is also punished because a coalesced buffer never exactly
matches any individual requested range, so `clear_ranges` silently
skips it: the buffer leaks in `PushBuffers` until the decoder
finishes or the caller manually calls `release_all_ranges` (#9624).
This increases peak RSS proportionally to the amount of data coalesced
ahead of the current row group.
- Prefetching is structurally impossible: speculatively pushed
buffers will straddle future read boundaries, so the decoder
cannot consume them, and `clear_ranges` cannot release them.
This commit makes `PushBuffers` boundary-agnostic, completing the
prefetching story, and changes the internals to scale with buffer count
instead of range count:
- Buffer stitching: `has_range`, `get_bytes`, and `Read::read` resolve
logical ranges across multiple contiguous physical buffers via binary
search, so the IO layer is free to push arbitrarily-sized parts
without knowing future read boundaries. This is a nice improvement,
because some IO layer can be made much more efficient when using
uniform buffers and vectorized reads.
- Incremental release (`release_through`): replaces `clear_ranges` with
a watermark-based release that drops all buffers below a byte offset,
trimming straddling buffers via zero-copy `Bytes::slice`.
The decoder calls this automatically at row-group boundaries.
Benchmark results (vs baseline):
push_decoder/1buf/1000ranges 321.9 µs (was 323.5 µs, −1%)
push_decoder/1buf/10000ranges 3.26 ms (was 3.25 ms, +0%)
push_decoder/1buf/100000ranges 34.9 ms (was 34.6 ms, +1%)
push_decoder/1buf/500000ranges 192.2 ms (was 185.3 ms, +4%)
push_decoder/Nbuf/1000ranges 363.9 µs (was 437.2 µs, −17%)
push_decoder/Nbuf/10000ranges 3.82 ms (was 10.7 ms, −64%)
push_decoder/Nbuf/100000ranges 42.1 ms (was 711.6 ms, −94%)
Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>PushBuffers boundary-agnostic for prefetch IO1 parent 711fac8 commit 504035a
File tree
7 files changed
+735
-138
lines changed- parquet/src
- arrow/push_decoder
- reader_builder
- file/metadata
- util
7 files changed
+735
-138
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
326 | 326 | | |
327 | 327 | | |
328 | 328 | | |
329 | | - | |
| 329 | + | |
330 | 330 | | |
331 | 331 | | |
332 | | - | |
333 | | - | |
334 | | - | |
335 | | - | |
336 | | - | |
337 | | - | |
| 332 | + | |
338 | 333 | | |
339 | 334 | | |
340 | 335 | | |
341 | 336 | | |
342 | | - | |
| 337 | + | |
343 | 338 | | |
344 | | - | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
345 | 348 | | |
346 | 349 | | |
347 | 350 | | |
| |||
366 | 369 | | |
367 | 370 | | |
368 | 371 | | |
369 | | - | |
| 372 | + | |
| 373 | + | |
370 | 374 | | |
371 | | - | |
| 375 | + | |
372 | 376 | | |
373 | 377 | | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
374 | 384 | | |
375 | | - | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
376 | 397 | | |
377 | 398 | | |
378 | 399 | | |
| |||
583 | 604 | | |
584 | 605 | | |
585 | 606 | | |
586 | | - | |
587 | | - | |
| 607 | + | |
588 | 608 | | |
589 | 609 | | |
590 | 610 | | |
591 | | - | |
| 611 | + | |
592 | 612 | | |
593 | 613 | | |
594 | 614 | | |
595 | | - | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
596 | 629 | | |
597 | 630 | | |
598 | 631 | | |
| |||
691 | 724 | | |
692 | 725 | | |
693 | 726 | | |
694 | | - | |
695 | | - | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
696 | 730 | | |
697 | 731 | | |
698 | 732 | | |
| |||
703 | 737 | | |
704 | 738 | | |
705 | 739 | | |
706 | | - | |
707 | | - | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
708 | 743 | | |
709 | 744 | | |
710 | | - | |
| 745 | + | |
| 746 | + | |
711 | 747 | | |
712 | | - | |
713 | | - | |
| 748 | + | |
| 749 | + | |
714 | 750 | | |
715 | 751 | | |
716 | 752 | | |
| |||
1167 | 1203 | | |
1168 | 1204 | | |
1169 | 1205 | | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
| 1213 | + | |
| 1214 | + | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
| 1220 | + | |
| 1221 | + | |
| 1222 | + | |
| 1223 | + | |
| 1224 | + | |
| 1225 | + | |
| 1226 | + | |
| 1227 | + | |
| 1228 | + | |
| 1229 | + | |
| 1230 | + | |
| 1231 | + | |
| 1232 | + | |
| 1233 | + | |
| 1234 | + | |
| 1235 | + | |
| 1236 | + | |
| 1237 | + | |
| 1238 | + | |
| 1239 | + | |
| 1240 | + | |
| 1241 | + | |
| 1242 | + | |
| 1243 | + | |
| 1244 | + | |
| 1245 | + | |
| 1246 | + | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
| 1261 | + | |
| 1262 | + | |
| 1263 | + | |
| 1264 | + | |
| 1265 | + | |
| 1266 | + | |
| 1267 | + | |
| 1268 | + | |
| 1269 | + | |
| 1270 | + | |
| 1271 | + | |
1170 | 1272 | | |
1171 | 1273 | | |
1172 | 1274 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
27 | 26 | | |
28 | 27 | | |
29 | 28 | | |
| |||
55 | 54 | | |
56 | 55 | | |
57 | 56 | | |
58 | | - | |
| 57 | + | |
59 | 58 | | |
60 | 59 | | |
61 | 60 | | |
| |||
72 | 71 | | |
73 | 72 | | |
74 | 73 | | |
75 | | - | |
| 74 | + | |
76 | 75 | | |
77 | | - | |
78 | | - | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
79 | 80 | | |
80 | 81 | | |
81 | 82 | | |
| |||
88 | 89 | | |
89 | 90 | | |
90 | 91 | | |
91 | | - | |
| 92 | + | |
92 | 93 | | |
93 | 94 | | |
94 | 95 | | |
| |||
105 | 106 | | |
106 | 107 | | |
107 | 108 | | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | 109 | | |
112 | 110 | | |
113 | 111 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
212 | 212 | | |
213 | 213 | | |
214 | 214 | | |
215 | | - | |
216 | | - | |
217 | | - | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
218 | 224 | | |
219 | 225 | | |
220 | 226 | | |
| |||
269 | 275 | | |
270 | 276 | | |
271 | 277 | | |
| 278 | + | |
272 | 279 | | |
273 | 280 | | |
274 | 281 | | |
| |||
610 | 617 | | |
611 | 618 | | |
612 | 619 | | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
613 | 626 | | |
614 | 627 | | |
615 | 628 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
73 | | - | |
74 | | - | |
75 | | - | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
76 | 82 | | |
77 | 83 | | |
78 | 84 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
713 | 713 | | |
714 | 714 | | |
715 | 715 | | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
716 | 731 | | |
717 | 732 | | |
718 | 733 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
27 | 26 | | |
28 | 27 | | |
29 | 28 | | |
| |||
360 | 359 | | |
361 | 360 | | |
362 | 361 | | |
363 | | - | |
| 362 | + | |
364 | 363 | | |
365 | 364 | | |
366 | 365 | | |
367 | 366 | | |
368 | 367 | | |
| 368 | + | |
369 | 369 | | |
370 | 370 | | |
371 | 371 | | |
| |||
397 | 397 | | |
398 | 398 | | |
399 | 399 | | |
400 | | - | |
401 | | - | |
402 | | - | |
403 | | - | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
404 | 404 | | |
405 | 405 | | |
406 | 406 | | |
| |||
445 | 445 | | |
446 | 446 | | |
447 | 447 | | |
448 | | - | |
| 448 | + | |
449 | 449 | | |
450 | 450 | | |
451 | 451 | | |
| |||
0 commit comments