Commit 90839df
authored
[Parquet] perf: Create StructArrays directly rather than via
# Which issue does this PR close?
<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->
- part of #9061
- Part of #9128
# Rationale for this change
I noticed on #9061 that there
is non trivial overhead to create struct arrays. I am trying to improve
`make_array` in parallel, but @tustvold had an even better idea in
#9058 (comment)
> My 2 cents is it would be better to move the codepaths relying on
ArrayData over to using the typed arrays directly, this should not only
cut down on allocations but unnecessary validation and dispatch
overheads.
# What changes are included in this PR?
Update the parquet `StructArray` reader (used for the top level
RecordBatch) to directly construct StructArray rather than using
ArrayData
# Are these changes tested?
By existing CI
Benchmarks show a small repeatable improvement of a few percent. For
example
```
arrow_reader_clickbench/async/Q10 1.00 12.7±0.35ms ? ?/sec 1.02 12.9±0.44ms ? ?/sec
```
I am pretty sure this is because the click bench dataset has more than
100 columns. Creating such a struct array requires cloning 100
`ArrayData` (one for each child) which each has a Vec<Buffers>. So this
saves (at least) 100 allocations per batch
# Are there any user-facing changes?
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
If there are any breaking changes to public APIs, please call them out.
-->ArrayData (1% improvement) (#9120)1 parent 4ddaa8c commit 90839df
File tree
2 files changed
+30
-16
lines changed- arrow-buffer/src/builder
- parquet/src/arrow/array_reader
2 files changed
+30
-16
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| |||
289 | 289 | | |
290 | 290 | | |
291 | 291 | | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
292 | 300 | | |
293 | 301 | | |
294 | 302 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | | - | |
| 21 | + | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
124 | 124 | | |
125 | 125 | | |
126 | 126 | | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | | - | |
134 | | - | |
135 | | - | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
136 | 134 | | |
| 135 | + | |
137 | 136 | | |
138 | 137 | | |
139 | 138 | | |
| |||
168 | 167 | | |
169 | 168 | | |
170 | 169 | | |
171 | | - | |
172 | | - | |
| 170 | + | |
173 | 171 | | |
174 | 172 | | |
175 | | - | |
176 | | - | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
177 | 183 | | |
178 | 184 | | |
179 | 185 | | |
| |||
0 commit comments