Commit e2b264f
Optimize data page statistics conversion (up to 4x) (#9303)
# Which issue does this PR close?
- Closes #9306
# Rationale for this change
Loading statis is notably inefficient. This makes the conversion from
the structure to arrow arrays a bit faster by avoiding allocations,
until we get a more efficient encoding directly
(#9296)
<details>
```
Extract data page statistics for Int64/extract_statistics/Int64
time: [5.2223 µs 5.2589 µs 5.3230 µs]
change: [−39.253% −38.205% −37.016%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low mild
3 (3.00%) high mild
2 (2.00%) high severe
Extract data page statistics for UInt64/extract_statistics/UInt64
time: [5.1035 µs 5.2173 µs 5.3576 µs]
change: [−32.745% −31.758% −30.535%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
8 (8.00%) high mild
6 (6.00%) high severe
Extract data page statistics for F64/extract_statistics/F64
time: [6.1922 µs 6.2021 µs 6.2130 µs]
change: [−30.749% −29.405% −28.469%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
Extract data page statistics for String/extract_statistics/String
time: [10.924 µs 10.965 µs 11.008 µs]
change: [−64.471% −64.330% −64.206%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
11 (11.00%) high mild
3 (3.00%) high severe
Extract data page statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, Stri...
time: [10.885 µs 10.905 µs 10.928 µs]
change: [−64.444% −64.362% −64.285%] (p = 0.00 < 0.05)
Performance has improved.
```
</details>
# What changes are included in this PR?
Converts the inefficient iterator-based code (which doesn't really fit
the iterator pattern well) to just traverse the values and use the
builders. (I think it's just converting a bunch of ugly code to another
bunch of ugly code).
Additionally tries to preallocate where possible.
# Are these changes tested?
Existing tests
# Are there any user-facing changes?
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>1 parent 5695bb3 commit e2b264f
File tree
3 files changed
+670
-473
lines changed- arrow-array/src/builder
- parquet/src
- arrow/arrow_reader
- file/page_index
3 files changed
+670
-473
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
260 | 260 | | |
261 | 261 | | |
262 | 262 | | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
263 | 283 | | |
264 | 284 | | |
265 | 285 | | |
| |||
0 commit comments