Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I am profiling clickbench query 10 with predicate pushdown enabled as part of
samply record -- /Users/andrewlamb/Software/datafusion2/target/profiling/datafusion-cli -f q.sql > /dev/null 2>&1
SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
While looking at the profile, I noticed that 7% of the time is spent in allocating / regrowing vectors (aka reallocating and copying)
Describe the solution you'd like
Avoid the time spent regrowing these vectors
It appears that the vectors in question are part of the ViewBuffer struct:
|
pub struct ViewBuffer { |
|
pub views: Vec<u128>, |
|
pub buffers: Vec<Buffer>, |
|
} |
Describe alternatives you've considered
Since we know how many views will be in each output buffer, we could create the ViewBuffers with the correct size initially
Something like like
ViewBuffers::with_capacity
Additional context
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I am profiling clickbench query 10 with predicate pushdown enabled as part of
filter_pushdown) by default datafusion#3463While looking at the profile, I noticed that 7% of the time is spent in allocating / regrowing vectors (aka reallocating and copying)
Describe the solution you'd like
Avoid the time spent regrowing these vectors
It appears that the vectors in question are part of the
ViewBufferstruct:arrow-rs/parquet/src/arrow/buffer/view_buffer.rs
Lines 30 to 33 in 02fa779
Describe alternatives you've considered
Since we know how many views will be in each output buffer, we could create the
ViewBufferswith the correct size initiallySomething like like
Additional context