Skip to content

Commit e2b264f

Browse files
Dandandanalamb
andauthored
Optimize data page statistics conversion (up to 4x) (#9303)
# Which issue does this PR close? - Closes #9306 # Rationale for this change Loading statis is notably inefficient. This makes the conversion from the structure to arrow arrays a bit faster by avoiding allocations, until we get a more efficient encoding directly (#9296) <details> ``` Extract data page statistics for Int64/extract_statistics/Int64 time: [5.2223 µs 5.2589 µs 5.3230 µs] change: [−39.253% −38.205% −37.016%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe Extract data page statistics for UInt64/extract_statistics/UInt64 time: [5.1035 µs 5.2173 µs 5.3576 µs] change: [−32.745% −31.758% −30.535%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 8 (8.00%) high mild 6 (6.00%) high severe Extract data page statistics for F64/extract_statistics/F64 time: [6.1922 µs 6.2021 µs 6.2130 µs] change: [−30.749% −29.405% −28.469%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe Extract data page statistics for String/extract_statistics/String time: [10.924 µs 10.965 µs 11.008 µs] change: [−64.471% −64.330% −64.206%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 11 (11.00%) high mild 3 (3.00%) high severe Extract data page statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, Stri... time: [10.885 µs 10.905 µs 10.928 µs] change: [−64.444% −64.362% −64.285%] (p = 0.00 < 0.05) Performance has improved. ``` </details> # What changes are included in this PR? Converts the inefficient iterator-based code (which doesn't really fit the iterator pattern well) to just traverse the values and use the builders. (I think it's just converting a bunch of ugly code to another bunch of ugly code). Additionally tries to preallocate where possible. # Are these changes tested? Existing tests # Are there any user-facing changes? --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
1 parent 5695bb3 commit e2b264f

File tree

3 files changed

+670
-473
lines changed

3 files changed

+670
-473
lines changed

arrow-array/src/builder/primitive_builder.rs

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,26 @@ impl<T: ArrowPrimitiveType> PrimitiveBuilder<T> {
260260
self.values_builder.extend_from_slice(values);
261261
}
262262

263+
/// Appends values from a iter of type `Option<T>`
264+
///
265+
/// # Panics
266+
///
267+
/// Panics if `values` and `is_valid` have different lengths
268+
#[inline]
269+
pub fn extend_from_iter_option<I: IntoIterator<Item = Option<T::Native>>>(&mut self, iter: I) {
270+
let iter = iter.into_iter();
271+
self.values_builder.extend(iter.map(|v| match v {
272+
Some(v) => {
273+
self.null_buffer_builder.append_non_null();
274+
v
275+
}
276+
None => {
277+
self.null_buffer_builder.append_null();
278+
T::Native::default()
279+
}
280+
}));
281+
}
282+
263283
/// Appends array values and null to this builder as is
264284
/// (this means that underlying null values are copied as is).
265285
///

0 commit comments

Comments
 (0)