Minor: try and avoid an allocation creating GenericByteViewArray from ArrayData#9156
Minor: try and avoid an allocation creating GenericByteViewArray from ArrayData#9156alamb merged 2 commits intoapache:mainfrom
GenericByteViewArray from ArrayData#9156Conversation
GenericByteViewArray from ArrayData
| impl<T: ByteViewType + ?Sized> From<ArrayData> for GenericByteViewArray<T> { | ||
| fn from(data: ArrayData) -> Self { | ||
| let (_data_type, len, nulls, offset, mut buffers, _child_data) = data.into_parts(); | ||
| let views = buffers.remove(0); // need to maintain order of remaining buffers |
There was a problem hiding this comment.
the idea is that remove(0) copies all the buffers and then Arc::from allocates/copies it again. Using Arc::from_iter builds the results directly
scovich
left a comment
There was a problem hiding this comment.
LGTM, one nit.
I guess it would be hard to measure a benchmark impact from the change, without going to an extreme case?
| // first buffer is views | ||
| let views = buffers[0].clone(); | ||
| // remaining buffers are data buffers | ||
| let buffers = Arc::from_iter(buffers.into_iter().skip(1)); |
There was a problem hiding this comment.
nit:
| // first buffer is views | |
| let views = buffers[0].clone(); | |
| // remaining buffers are data buffers | |
| let buffers = Arc::from_iter(buffers.into_iter().skip(1)); | |
| // first buffer is views, remaining buffers are data buffers | |
| let buffers = buffers.into_iter(); | |
| let views = buffers.next().unwrap(); // safety: never empty | |
| let buffers = Arc::from_iter(buffers); |
(question tho -- why do we know buffers is never empty?)
Also -- it would be fewer lines of code to fold those two values into the constructor call?
let buffers = buffers.into_iter();
Self {
data_type: T::DATA_TYPE,
views: buffers.next().unwrap(),
buffers: Arc::from_iter(buffers),There was a problem hiding this comment.
(question tho -- why do we know buffers is never empty?)
Basically because the ArrayData is validated for each Arrow type -- and since it is validated by
arrow-rs/arrow-data/src/data.rs
Line 353 in f122d77
there must always be at least buffer
There was a problem hiding this comment.
That's about what I figured, but I couldn't find where the validation actually happened. Thanks for the pointer!
Yeah, thank you. I have found benchmarking anything related to allocations quite tricky |
Good call -- done in 707bf41 |
|
Thanks everyone |
…om `ArrayData` (apache#9156) # Which issue does this PR close? - part of apache#9061 - follow on apache#9114 # Rationale for this change @scovich noted in apache#9114 (comment) that calling `Vec::remove` does an extra copy and that `Vec::from` doesn't actually reuse the allocation the way I thought it did # What changes are included in this PR? Build the Arc for buffers directly # Are these changes tested? BY existing tests # Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. -->
Which issue does this PR close?
make_array) #9061make_arrayforStructArrayandGenericByteViewArray#9114Rationale for this change
@scovich noted in #9114 (comment) that calling
Vec::removedoes an extra copy and thatVec::fromdoesn't actually reuse the allocation the way I thought it didWhat changes are included in this PR?
Build the Arc for buffers directly
Are these changes tested?
BY existing tests
Are there any user-facing changes?