Implements Sum,sum_checked,min,max,is Distict,inverse for REE. #7933
Conversation
…ncludes helper functions for expanding REE into logical represention
4ded72e to
05fb3c0
Compare
…tion is the only function left
05fb3c0 to
72bd81a
Compare
…ly without expanding to logical form.
5688ad3 to
9d00687
Compare
|
Just want to checkup on this PR, if it's still being worked on? Looking to see if we can get REE support pushed along in arrow-rs |
|
According to #3520 (comment), the work was done while @rich-t-kid-datadog was an intern, and he's not available anymore. I can pick it up -- which should mostly be resolving merge conflicts. |
|
That sounds good, I'm willing to help review this to get it along |
|
This isn't ready as-is:
I'll send a first PR just for aggregate, and later a second PR for cmp. |
|
Aggregate work done in #9409. In particular, it takes care of sliced Ree arrays. |
|
CMP done in #9448 |
|
Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look |
Which issue does this PR close?
This PR works towards closing the larger REE Epic
Rationale for this change
Add operations onto the REE datatype such as
What changes are included in this PR?
Allows for REE columns to be used for the previously mentioned functions correctly and efficently.
Are these changes tested?
Yes, comprehensive tests have been added in arrow-ord/src/cmp.rs and arrow-arith/src/aggregate.rs:
Are there any user-facing changes?
Performance improvement: REE distinct operations are now much faster for datasets with repeated values
No API changes: Existing distinct() and not_distinct() functions work the same way but are now more efficient for REE arrays
No breaking changes: All existing functionality is preserved
If there are user-facing changes then we may require documentation to be updated before approving the PR.
If there are any breaking changes to public APIs, please call them out.