-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-47375: [C++][Compute] Move scatter function into compute core #47378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Yes, done.
It will be required by another piece of compute functionality in libarrow (NOT libarrow_compute) - |
|
What is the size increase of |
|
On my M1 Mac, the |
|
I also tried locally on Ubuntu 22.04, on a RelWithDebInfo build:
$ ls -la libarrow*.so.2200.0.0
-rwxrwxr-x 1 antoine antoine 11031504 août 21 11:14 libarrow_acero.so.2200.0.0
-rwxrwxr-x 1 antoine antoine 96039376 août 21 11:14 libarrow_compute.so.2200.0.0
-rwxrwxr-x 1 antoine antoine 1015536 août 21 11:14 libarrow_cuda.so.2200.0.0
-rwxrwxr-x 1 antoine antoine 12404648 août 21 11:14 libarrow_dataset.so.2200.0.0
-rwxrwxr-x 1 antoine antoine 89553280 août 21 11:13 libarrow.so.2200.0.0
-rwxrwxr-x 1 antoine antoine 7548160 août 21 11:14 libarrow_testing.so.2200.0.0
$ size --format=GNU libarrow*.so.2200.0.0
text data bss total filename
1231480 436253 1088 1668821 libarrow_acero.so.2200.0.0
12275032 2116773 44632 14436437 libarrow_compute.so.2200.0.0
123700 85770 3016 212486 libarrow_cuda.so.2200.0.0
1028813 613674 3832 1646319 libarrow_dataset.so.2200.0.0
10766408 3843396 2180945 16790749 libarrow.so.2200.0.0
1008311 497462 2600 1508373 libarrow_testing.so.2200.0.0
$ ls -la libarrow*.so.2200.0.0
-rwxrwxr-x 1 antoine antoine 11031504 août 21 11:15 libarrow_acero.so.2200.0.0
-rwxrwxr-x 1 antoine antoine 92921808 août 21 11:15 libarrow_compute.so.2200.0.0
-rwxrwxr-x 1 antoine antoine 1015536 août 21 11:15 libarrow_cuda.so.2200.0.0
-rwxrwxr-x 1 antoine antoine 12404648 août 21 11:15 libarrow_dataset.so.2200.0.0
-rwxrwxr-x 1 antoine antoine 92660976 août 21 11:15 libarrow.so.2200.0.0
-rwxrwxr-x 1 antoine antoine 7548160 août 21 11:15 libarrow_testing.so.2200.0.0
$ size --format=GNU libarrow*.so.2200.0.0
text data bss total filename
1231480 436253 1088 1668821 libarrow_acero.so.2200.0.0
11678104 2069268 45544 13792916 libarrow_compute.so.2200.0.0
123700 85770 3016 212486 libarrow_cuda.so.2200.0.0
1028813 613674 3832 1646319 libarrow_dataset.so.2200.0.0
11360776 3889882 2183513 17434171 libarrow.so.2200.0.0
1008311 497462 2600 1508373 libarrow_testing.so.2200.0.0So, |
|
A 4% increase in I can't think of any alternatives and the necessity for selection-vector-aware seems justified. We probably should have some guidelines of when/how those changes are acceptable / justified for future reference. |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 6f6138b. There weren't enough matching historic benchmark results to make a call on whether there were regressions. The full Conbench report has more details. |
#47448) ### Rationale for this change The Meson configuration for compute was broken after the vector swizzle change was introduced; this gets the Meson configuration at parity with CMake again ### What changes are included in this PR? Move `vector_swizzle.cc` to `libarrow` from `libarrow_compute`. See also: #47378 ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: #47446 Authored-by: Will Ayd <william.ayd@icloud.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Rationale for this change
In order to support special form (#47374), the kernels have to respect the selection vector. Currently none of the kernels does. And it's almost impossible for us to make all existing kernels to respect the selection vector at once (and we probably never will). Thus we need an incremental way to add selection-vector-aware kernels on demand, meanwhile accommodate legacy (selection-vector-non-aware) kernels to be executed "selection-vector-aware"-ly in a general manner - the idea is to first "gather" selected rows from the batch into a new batch, evaluate the expression on the new batch, then "scatter" the result rows into the positions where they belong in the original batch.
This makes the
takeandscatterfunctions dependencies of the exec facilities, which is in compute core (libarrow). Andtakeis already in compute core. Now we need to movescatter.I'm implementing the selective execution of kernels in #47377, including invoking
takeandscatteras explained above. And I have to write tests of that inexec_test.ccwhich is deliberately declared to be NOT depending on libarrow_compute.What changes are included in this PR?
Move scatter compute function into compute core.
Are these changes tested?
Yes. Manually tested.
Are there any user-facing changes?
None.
scatterfunction into compute core #47375