GH-47376: [C++][Compute] Support selective execution for kernels #47377

zanmato1984 · 2025-08-20T07:17:42Z

Rationale for this change

In order to support special form (#47374), being able to "selective"-ly execute the kernel becomes a prerequisite. As mentioned in #47374, we need an incremental way to add selective kernels on demand, meanwhile accommodate arbitrary legacy kernels to be executed selectively in a general manner.

What changes are included in this PR?

Enrich the selection vector/span functionalities.
Introduce an optional API ArrayKernelSelectiveExec(KernelContext*, const ExecSpan&, const SelectionVectorSpan&, ExecResult*) in the kernel. This is the entry for selectively executing the kernel on a batch with a given selection vector. The kernel author can provide a dedicated implementation for such kernel API so the kernel can be executed "sparse"-ly - only the rows indicated by the selection vector will be processed. Otherwise the selective execution will fall back to a general "dense" way - gather the selected rows into a new contiguous (dense) batch, execute the kernel using the non-selective exec API, then scatter the result back to the original row positions.
Extend ScalarExecutor with dense execution ability.

Are these changes tested?

Tested and benchmarked.

Are there any user-facing changes?

None.

GitHub Issue: [C++][Compute] Support selective execution for kernels #47376

github-actions · 2025-08-20T07:18:09Z

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

github-actions · 2025-08-20T07:19:21Z

⚠️ GitHub issue #47376 has been automatically assigned in GitHub to PR creator.

github-actions · 2025-08-20T07:19:21Z

⚠️ GitHub issue #47376 has no components, please add labels for components.

) ### Rationale for this change In order to support special form (#47374), the kernels have to respect the selection vector. Currently none of the kernels does. And it's almost impossible for us to make all existing kernels to respect the selection vector at once (and we probably never will). Thus we need an incremental way to add selection-vector-aware kernels on demand, meanwhile accommodate legacy (selection-vector-non-aware) kernels to be executed "selection-vector-aware"-ly in a general manner - the idea is to first "gather" selected rows from the batch into a new batch, evaluate the expression on the new batch, then "scatter" the result rows into the positions where they belong in the original batch. This makes the `take` and `scatter` functions dependencies of the exec facilities, which is in compute core (libarrow). And `take` is already in compute core. Now we need to move `scatter`. I'm implementing the selective execution of kernels in #47377, including invoking `take` and `scatter` as explained above. And I have to write tests of that in `exec_test.cc` which is deliberately declared to be NOT depending on libarrow_compute. ### What changes are included in this PR? Move scatter compute function into compute core. ### Are these changes tested? Yes. Manually tested. ### Are there any user-facing changes? None. * GitHub Issue: #47375 Authored-by: Rossi Sun <zanmato1984@gmail.com> Signed-off-by: Rossi Sun <zanmato1984@gmail.com>

zanmato1984 · 2025-09-13T03:03:37Z

Attaching benchmark results. The benchmark employs a trivial kernel that does nothing but spins specified number of times to simulate CPU intensity, and the number of rows of the batch is 4k.

Baseline: Regular kernel, no selection vector.

click to expand

BM_ExecBaseline/kernel_intensity:0/num_rows:4096                                 2292 ns         2292 ns       307973 items_per_second=1.78713G/s
BM_ExecBaseline/kernel_intensity:20/num_rows:4096                               22858 ns        22856 ns        30544 items_per_second=179.211M/s
BM_ExecBaseline/kernel_intensity:40/num_rows:4096                               43164 ns        43161 ns        16245 items_per_second=94.9008M/s
BM_ExecBaseline/kernel_intensity:60/num_rows:4096                               63424 ns        63419 ns        11244 items_per_second=64.5862M/s
BM_ExecBaseline/kernel_intensity:80/num_rows:4096                               94636 ns        94625 ns         7412 items_per_second=43.2867M/s
BM_ExecBaseline/kernel_intensity:100/num_rows:4096                             113438 ns       113416 ns         6185 items_per_second=36.1148M/s

Sparse: Selective kernel, with a selection vector of selectivity from 0 to 100%.

click to expand

BM_ExecSelective/sparse/selectivity:0/kernel_intensity:0/num_rows:4096            358 ns          358 ns      2036026 items_per_second=11.4467G/s
BM_ExecSelective/sparse/selectivity:20/kernel_intensity:0/num_rows:4096           892 ns          892 ns       782333 items_per_second=4.59122G/s
BM_ExecSelective/sparse/selectivity:50/kernel_intensity:0/num_rows:4096          1711 ns         1711 ns       405743 items_per_second=2.39461G/s
BM_ExecSelective/sparse/selectivity:100/kernel_intensity:0/num_rows:4096         3144 ns         3137 ns       225033 items_per_second=1.30557G/s
BM_ExecSelective/sparse/selectivity:0/kernel_intensity:20/num_rows:4096           331 ns          331 ns      2011963 items_per_second=12.3736G/s
BM_ExecSelective/sparse/selectivity:20/kernel_intensity:20/num_rows:4096         5008 ns         5007 ns       139395 items_per_second=818.003M/s
BM_ExecSelective/sparse/selectivity:50/kernel_intensity:20/num_rows:4096        11977 ns        11976 ns        59925 items_per_second=342.019M/s
BM_ExecSelective/sparse/selectivity:100/kernel_intensity:20/num_rows:4096       23337 ns        23336 ns        30261 items_per_second=175.525M/s
BM_ExecSelective/sparse/selectivity:0/kernel_intensity:40/num_rows:4096           352 ns          352 ns      2041411 items_per_second=11.6226G/s
BM_ExecSelective/sparse/selectivity:20/kernel_intensity:40/num_rows:4096         8984 ns         8983 ns        76190 items_per_second=455.96M/s
BM_ExecSelective/sparse/selectivity:50/kernel_intensity:40/num_rows:4096        22603 ns        22511 ns        31604 items_per_second=181.959M/s
BM_ExecSelective/sparse/selectivity:100/kernel_intensity:40/num_rows:4096       45411 ns        45408 ns        15878 items_per_second=90.2046M/s
BM_ExecSelective/sparse/selectivity:0/kernel_intensity:60/num_rows:4096           356 ns          356 ns      1930640 items_per_second=11.4986G/s
BM_ExecSelective/sparse/selectivity:20/kernel_intensity:60/num_rows:4096        16540 ns        16539 ns        42808 items_per_second=247.659M/s
BM_ExecSelective/sparse/selectivity:50/kernel_intensity:60/num_rows:4096        45119 ns        45117 ns        14423 items_per_second=90.7867M/s
BM_ExecSelective/sparse/selectivity:100/kernel_intensity:60/num_rows:4096      116450 ns       116443 ns         5983 items_per_second=35.176M/s
BM_ExecSelective/sparse/selectivity:0/kernel_intensity:80/num_rows:4096           341 ns          341 ns      2030557 items_per_second=12.0073G/s
BM_ExecSelective/sparse/selectivity:20/kernel_intensity:80/num_rows:4096        19608 ns        19607 ns        35973 items_per_second=208.904M/s
BM_ExecSelective/sparse/selectivity:50/kernel_intensity:80/num_rows:4096        48072 ns        48069 ns        14415 items_per_second=85.2115M/s
BM_ExecSelective/sparse/selectivity:100/kernel_intensity:80/num_rows:4096       95814 ns        95808 ns         7362 items_per_second=42.7521M/s
BM_ExecSelective/sparse/selectivity:0/kernel_intensity:100/num_rows:4096          354 ns          354 ns      1978016 items_per_second=11.578G/s
BM_ExecSelective/sparse/selectivity:20/kernel_intensity:100/num_rows:4096       23879 ns        23877 ns        29476 items_per_second=171.545M/s
BM_ExecSelective/sparse/selectivity:50/kernel_intensity:100/num_rows:4096       58874 ns        58870 ns        11791 items_per_second=69.577M/s
BM_ExecSelective/sparse/selectivity:100/kernel_intensity:100/num_rows:4096     117665 ns       117519 ns         6018 items_per_second=34.854M/s

Dense: Regular kernel enclosed by gather/scatter, with a selection vector of selectivity from 0 to 100%.

click to expand

BM_ExecSelective/dense/selectivity:0/kernel_intensity:0/num_rows:4096            2986 ns         2985 ns       232954 items_per_second=1.372G/s
BM_ExecSelective/dense/selectivity:20/kernel_intensity:0/num_rows:4096           5137 ns         5136 ns       139173 items_per_second=797.448M/s
BM_ExecSelective/dense/selectivity:50/kernel_intensity:0/num_rows:4096           8414 ns         8413 ns        80579 items_per_second=486.869M/s
BM_ExecSelective/dense/selectivity:100/kernel_intensity:0/num_rows:4096          8773 ns         8756 ns        80353 items_per_second=467.815M/s
BM_ExecSelective/dense/selectivity:0/kernel_intensity:20/num_rows:4096           2973 ns         2973 ns       234434 items_per_second=1.37767G/s
BM_ExecSelective/dense/selectivity:20/kernel_intensity:20/num_rows:4096          9483 ns         9482 ns        74308 items_per_second=431.964M/s
BM_ExecSelective/dense/selectivity:50/kernel_intensity:20/num_rows:4096         19647 ns        19646 ns        36317 items_per_second=208.492M/s
BM_ExecSelective/dense/selectivity:100/kernel_intensity:20/num_rows:4096        31157 ns        31155 ns        22985 items_per_second=131.473M/s
BM_ExecSelective/dense/selectivity:0/kernel_intensity:40/num_rows:4096           3012 ns         3012 ns       227259 items_per_second=1.36004G/s
BM_ExecSelective/dense/selectivity:20/kernel_intensity:40/num_rows:4096         16151 ns        16147 ns        42675 items_per_second=253.665M/s
BM_ExecSelective/dense/selectivity:50/kernel_intensity:40/num_rows:4096         35706 ns        35704 ns        20202 items_per_second=114.722M/s
BM_ExecSelective/dense/selectivity:100/kernel_intensity:40/num_rows:4096        59021 ns        59017 ns        10707 items_per_second=69.4042M/s
BM_ExecSelective/dense/selectivity:0/kernel_intensity:60/num_rows:4096           3020 ns         3020 ns       234537 items_per_second=1.35637G/s
BM_ExecSelective/dense/selectivity:20/kernel_intensity:60/num_rows:4096         18433 ns        18432 ns        27608 items_per_second=222.225M/s
BM_ExecSelective/dense/selectivity:50/kernel_intensity:60/num_rows:4096         55907 ns        55903 ns        10499 items_per_second=73.2698M/s
BM_ExecSelective/dense/selectivity:100/kernel_intensity:60/num_rows:4096       108566 ns       108559 ns         9644 items_per_second=37.7305M/s
BM_ExecSelective/dense/selectivity:0/kernel_intensity:80/num_rows:4096           3000 ns         3000 ns       239374 items_per_second=1.36546G/s
BM_ExecSelective/dense/selectivity:20/kernel_intensity:80/num_rows:4096         24141 ns        24140 ns        29050 items_per_second=169.68M/s
BM_ExecSelective/dense/selectivity:50/kernel_intensity:80/num_rows:4096         55833 ns        55830 ns        12568 items_per_second=73.3658M/s
BM_ExecSelective/dense/selectivity:100/kernel_intensity:80/num_rows:4096       102989 ns       102963 ns         6737 items_per_second=39.7812M/s
BM_ExecSelective/dense/selectivity:0/kernel_intensity:100/num_rows:4096          3010 ns         3010 ns       231996 items_per_second=1.36102G/s
BM_ExecSelective/dense/selectivity:20/kernel_intensity:100/num_rows:4096        28224 ns        28222 ns        24899 items_per_second=145.134M/s
BM_ExecSelective/dense/selectivity:50/kernel_intensity:100/num_rows:4096        65519 ns        65515 ns        10595 items_per_second=62.5201M/s
BM_ExecSelective/dense/selectivity:100/kernel_intensity:100/num_rows:4096      122453 ns       122444 ns         5579 items_per_second=33.452M/s

zanmato1984 · 2025-09-13T03:59:31Z

Some interesting comparisons to note:

When selectivity is high, sparse execution is slightly slower than the baseline due to the indirection introduced by accessing the selection vector:

BM_ExecBaseline/kernel_intensity:0/num_rows:4096                                 2292 ns         2292 ns       307973 items_per_second=1.78713G/s
BM_ExecSelective/sparse/selectivity:100/kernel_intensity:0/num_rows:4096         3144 ns         3137 ns       225033 items_per_second=1.30557G/s

When selectivity is low, sparse execution is much faster than the baseline because the (unnecessary) computation of most rows are skipped:
```
BM_ExecBaseline/kernel_intensity:0/num_rows:4096                                 2292 ns         2292 ns       307973 items_per_second=1.78713G/s
BM_ExecSelective/sparse/selectivity:0/kernel_intensity:0/num_rows:4096            358 ns          358 ns      2036026 items_per_second=11.4467G/s
```
Therefore we can reasonably expect better performance for cases like if (unlikely_condition) then heavy_kernel() else ligth_kernel() end. (Assume there is if_else special form in place, that executes each branch using a selection vector, and the kernels in both branches have selective exec.)

As the kernel's CPU intensity increases, the performance benefit becomes more significant:

BM_ExecBaseline/kernel_intensity:100/num_rows:4096                             113438 ns       113416 ns         6185 items_per_second=36.1148M/s
BM_ExecSelective/sparse/selectivity:0/kernel_intensity:100/num_rows:4096          354 ns          354 ns      1978016 items_per_second=11.578G/s

Falling back to dense execution (when the kernel doesn't supply a selective exec) is slow: up to 4x slower than the baseline and 3x slower than sparse execution:

BM_ExecBaseline/kernel_intensity:0/num_rows:4096                                 2292 ns         2292 ns       307973 items_per_second=1.78713G/s
BM_ExecSelective/sparse/selectivity:100/kernel_intensity:0/num_rows:4096         3144 ns         3137 ns       225033 items_per_second=1.30557G/s
BM_ExecSelective/dense/selectivity:100/kernel_intensity:0/num_rows:4096          8773 ns         8756 ns        80353 items_per_second=467.815M/s

This is not very surprising because dense execution does another two function invocations under the hood (gather/scatter).

Low selectivity is also beneficial to dense execution due to the reduction of the unnecessary computation - sometimes even beats the overhead of the aforementioned extra function invocations:

BM_ExecBaseline/kernel_intensity:20/num_rows:4096                               22858 ns        22856 ns        30544 items_per_second=179.211M/s
BM_ExecSelective/dense/selectivity:0/kernel_intensity:20/num_rows:4096           2973 ns         2973 ns       234434 items_per_second=1.37767G/s
BM_ExecSelective/dense/selectivity:20/kernel_intensity:20/num_rows:4096          9483 ns         9482 ns        74308 items_per_second=431.964M/s
BM_ExecSelective/dense/selectivity:50/kernel_intensity:20/num_rows:4096         19647 ns        19646 ns        36317 items_per_second=208.492M/s

High CPU intensity of the kernel also amortizes the aforementioned dense execution overhead.

BM_ExecBaseline/kernel_intensity:0/num_rows:4096                                 2292 ns         2292 ns       307973 items_per_second=1.78713G/s
BM_ExecSelective/dense/selectivity:100/kernel_intensity:0/num_rows:4096          8773 ns         8756 ns        80353 items_per_second=467.815M/s

BM_ExecBaseline/kernel_intensity:100/num_rows:4096                             113438 ns       113416 ns         6185 items_per_second=36.1148M/s
BM_ExecSelective/dense/selectivity:100/kernel_intensity:100/num_rows:4096      122453 ns       122444 ns         5579 items_per_second=33.452M/s

4x amortized to 1.08x.

zanmato1984 · 2025-09-13T04:01:32Z

Hi @pitrou @bkietz @westonpace @felipecrv , I know this is a big one, but I do hope some of you can help to review this PR - this is the most critical prerequisite for the if_else special form.

Appreciated!

zanmato1984 · 2025-10-14T18:25:33Z

Kindly ping @pitrou @bkietz @westonpace @felipecrv .

zanmato1984 · 2025-10-30T19:52:28Z

I have a subsequent PR depending on this one (and it's almost ready for quite a while in my local), I would really appreciate if some reviewer could help to proceed on this one. Thanks a lot. @pitrou @bkietz @westonpace @felipecrv

pitrou · 2025-10-30T19:59:41Z

I'll be on vacation next week, so I won't be able to take a look at this before ~10 days.

zanmato1984 · 2025-10-30T20:10:03Z

No problem at all, thanks for the heads-up! Just wanted to make sure this PR stays on the radar. Have a great vacation!

zanmato1984 · 2025-11-20T07:35:07Z

I've had my next PR for special form almost ready, only some comment and doc left. I sent it in my own repo: zanmato1984#64 just in case you want to see how the selective execution is actually utilized to implement special forms.

That PR is derived from this one so it contains the same content (once this one get merged I'll rebase that one). So I really hope this one can be reviewed and merged soon. @pitrou @bkietz @westonpace @felipecrv Thanks.

zanmato1984 · 2025-12-03T01:38:49Z

Kindly ping @pitrou , @felipecrv . Did you have a chance to take a look? Thanks.

felipecrv · 2025-12-16T20:29:11Z

cpp/src/arrow/compute/kernel.h

+using ArrayKernelSelectiveExec = Status (*)(KernelContext*, const ExecSpan&,
+                                            const SelectionVectorSpan&, ExecResult*);


I don't have a suggestion yet, but if we plan to support bitmaps as well, it would probably be better to pass something here that can be either a selection vector or a bitmap mask. The alternative being yet another KernelExec -- ArrayKernelMaskedExec.

Starting to think that adding another KernelExec will probably be best.

Selection vectors are better than bitmask for very selective filters. Bitmasks are better when the filter is not very selective. Bitmaps are less important than selection vectors because if the filter is not selected computing on every value is not as bad.

Good suggestion. Shall we do that in follow-up PRs?

cpp/src/arrow/compute/kernel.h

felipecrv · 2025-12-16T20:44:14Z

cpp/src/arrow/compute/function.h

-  /// handling (intersect validity bitmaps of inputs).
+  /// \brief Add a kernel with given input/output types and exec API, no selective exec
+  /// API, no required state initialization, preallocation for fixed-width types, and
+  /// default null handling (intersect validity bitmaps of inputs).


I think you can keep this one as is.

I think this is being as clear as the rest of the style ("no required state" etc.)

felipecrv · 2025-12-16T20:50:25Z

cpp/src/arrow/compute/exec.cc

+  if (selection_vector_) {
+    selection_length_ = selection_vector_->length();
+  } else {
+    selection_length_ = 0;


I think it's less confusing if, without a selection, the "length of the selection" be the length of the whole array.

Hmm, I may see it otherwise.

The naming of the three selection_*_ members implies they are tightly coupled (with selection_vector_ being the "leader"). If selection_vector_ is null, then the value of selection_length_ makes no sense, then 0 is more close to the meaning of "nonsense" (less than -1 though) I guess?

felipecrv · 2025-12-17T00:10:49Z

cpp/src/arrow/compute/exec.cc

+    while (indices_begin + num_indices < indices_end &&
+           *(indices_begin + num_indices) < chunk_row_id_end) {
+      ++num_indices;
+    }


This is a slow placeholder, right? You will have to do an Exponential search from n - 1 to 0.

https://en.wikipedia.org/wiki/Exponential_search

Good catch. Replaced with a log(N) complexity std::lower_bound().

felipecrv · 2025-12-17T01:41:11Z

cpp/src/arrow/compute/exec.h

 ///
-/// We are not yet using this so this is mostly a placeholder for now.
-///
 /// [1]: http://cidrdb.org/cidr2005/papers/P19.pdf


felipecrv · 2025-12-17T01:48:49Z

cpp/src/arrow/compute/exec.h

+  void SetSlice(int64_t offset, int64_t length, int32_t index_back_shift = 0);
+
+  int32_t operator[](int64_t i) const {
+    return indices_[i + offset_] - index_back_shift_;


When you use this class in loops, you will probably get better assembly if it's copied into a local variable (in the "stack") before the loop to get SROA [1] to kick in and then you can keep all these members in registers.

[1] https://blog.regehr.org/archives/1603

My concern is that exposing the index_back_shift would be too verbose and error-prone. Better use some encapsulation to hide it. Maybe let the span accept a lambda, within which we can write more compiler-friendly code meanwhile keep the index_back_shift hidden?

felipecrv · 2025-12-17T01:51:31Z

cpp/src/arrow/compute/exec_benchmark.cc

+
+inline void Spin(volatile int64_t count) {
+  while (count-- > 0) {
+    // Do nothing, just burn CPU cycles.


~~compiler probably optimizes this away~~

ok, now I see the volatile.

felipecrv · 2025-12-17T02:03:13Z

cpp/src/arrow/compute/exec_internal.h

+VisitSelectionVectorSpanInline(const SelectionVectorSpan& selection,
+                               OnSelectionFn&& on_selection) {
+  for (int64_t i = 0; i < selection.length(); ++i) {
+    RETURN_NOT_OK(on_selection(selection[i]));


In theory, returning a Status is a cheap and simple (to the compiler) operation, but in practice it's not. Consider requiring a function that returns bool. If you always returns true, the inlines will remove the branches for early-return inside the loop.

Sorry, I don't get it. Could you elaborate a bit?

cpp/src/arrow/compute/exec_internal.h

felipecrv · 2025-12-17T02:10:20Z

I think this looks good. I hope @pitrou is open and excited to the idea of kernels fused with filtering from selection vectors.

pitrou · 2025-12-17T08:24:06Z

I think this looks good. I hope @pitrou is open and excited to the idea of kernels fused with filtering from selection vectors.

Haha. I'm open on the principle. I just need to make time to look at the many details, sorry...

Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>

This reverts commit db970f1e89722dcf65b09553c51ed1b1a4ede3ce.

zanmato1984 · 2025-12-17T08:10:52Z

cpp/src/arrow/compute/exec.cc

+    std::vector<Datum> values(batch.num_values());
+    for (int i = 0; i < batch.num_values(); ++i) {
+      if (batch[i].is_scalar()) {
+        // XXX: Skip gather for scalars since it is not currently supported by Take.


Technically it's not necessary. But the drawback is we lose the ability to uniformly call Take on any Datum - have to make sure it's not scalar and go through a special path, like here, for scalar.

I think maybe we can simply return the scalar as is for Take (to allow the uniform invoking on arbitrary Datum). Or we insist that taking scalar makes no sense and we do special checks everywhere.

zanmato1984 · 2025-12-17T08:19:30Z

cpp/src/arrow/compute/function.h

-  /// handling (intersect validity bitmaps of inputs).
+  /// \brief Add a kernel with given input/output types and exec API, no selective exec
+  /// API, no required state initialization, preallocation for fixed-width types, and
+  /// default null handling (intersect validity bitmaps of inputs).


I think this is being as clear as the rest of the style ("no required state" etc.)

zanmato1984 · 2025-12-17T08:20:36Z

cpp/src/arrow/compute/kernel.h

+using ArrayKernelSelectiveExec = Status (*)(KernelContext*, const ExecSpan&,
+                                            const SelectionVectorSpan&, ExecResult*);


Good suggestion. Shall we do that in follow-up PRs?

cpp/src/arrow/compute/exec_internal.h

zanmato1984 · 2025-12-17T08:50:12Z

cpp/src/arrow/compute/exec.cc

+    return ExecuteBatch(batch, listener);
+  }
+
+  Datum WrapResults(const std::vector<Datum>& inputs,


It's override of a public method of its parent class KernelExecutor::WrapResults().

zanmato1984 · 2025-12-17T09:18:26Z

cpp/src/arrow/compute/exec.cc

+    } else {
+      DCHECK(val.is_array());
+      arrays.emplace_back(val.make_array());
+    }


Plus, as a quite independent free function, I think it's no harm to extend it a little bit to support chunked array?

zanmato1984 · 2025-12-17T09:27:04Z

cpp/src/arrow/compute/exec.cc

+    while (indices_begin + num_indices < indices_end &&
+           *(indices_begin + num_indices) < chunk_row_id_end) {
+      ++num_indices;
+    }


Good catch. Replaced with a log(N) complexity std::lower_bound().

zanmato1984 · 2025-12-17T09:33:23Z

cpp/src/arrow/compute/exec.cc

+      return kernel_->selective_exec(kernel_ctx_, input, *selection, out);
+    }
+    return kernel_->exec(kernel_ctx_, input, out);
+  }


This pre-condition that non-null selection implies non-null selective_exec is very specific.

Sorry I don't get it. The two callsites both have the possibility that selection is non-null, and we need to make sure that selective_exec is also non-null. In other word, if we inline it, the code would be exactly the same in these two places.

Or are you suggesting something performance-wise?

zanmato1984 · 2025-12-17T09:36:19Z

cpp/src/arrow/compute/exec.h

+  void SetSlice(int64_t offset, int64_t length, int32_t index_back_shift = 0);
+
+  int32_t operator[](int64_t i) const {
+    return indices_[i + offset_] - index_back_shift_;


My concern is that exposing the index_back_shift would be too verbose and error-prone. Better use some encapsulation to hide it. Maybe let the span accept a lambda, within which we can write more compiler-friendly code meanwhile keep the index_back_shift hidden?

zanmato1984 · 2025-12-17T09:36:56Z

cpp/src/arrow/compute/exec_internal.h

+VisitSelectionVectorSpanInline(const SelectionVectorSpan& selection,
+                               OnSelectionFn&& on_selection) {
+  for (int64_t i = 0; i < selection.length(); ++i) {
+    RETURN_NOT_OK(on_selection(selection[i]));


Sorry, I don't get it. Could you elaborate a bit?

zanmato1984 marked this pull request as draft August 20, 2025 07:17

github-actions bot added Component: C++ awaiting review Awaiting review labels Aug 20, 2025

zanmato1984 changed the title ~~Special form #2: Selective kernel execution~~ GH-47376: [C++][Compute] Support selective execution for kernels Aug 20, 2025

zanmato1984 mentioned this pull request Aug 20, 2025

GH-47375: [C++][Compute] Move scatter function into compute core #47378

Merged

zanmato1984 force-pushed the fix/gh-47376 branch 2 times, most recently from 0cce8c3 to 99a3217 Compare August 28, 2025 04:11

zanmato1984 mentioned this pull request Aug 29, 2025

Fix the issue that comparison function could not handle decimal arguments with different scales zanmato1984/arrow#53

Closed

zanmato1984 force-pushed the fix/gh-47376 branch 2 times, most recently from 99098d2 to c005bbb Compare September 10, 2025 07:00

zanmato1984 force-pushed the fix/gh-47376 branch 7 times, most recently from c502acc to 469ead1 Compare September 12, 2025 01:20

zanmato1984 marked this pull request as ready for review September 13, 2025 01:09

zanmato1984 force-pushed the fix/gh-47376 branch from 469ead1 to 9a1c49e Compare September 13, 2025 02:55

zanmato1984 force-pushed the fix/gh-47376 branch 4 times, most recently from 966d999 to 9072a34 Compare October 7, 2025 23:44

zanmato1984 force-pushed the fix/gh-47376 branch from 9072a34 to 7c26ec8 Compare October 14, 2025 18:24

Support selection vector for kernel execution

6e5186b

zanmato1984 force-pushed the fix/gh-47376 branch from 7c26ec8 to 6e5186b Compare November 5, 2025 23:13

felipecrv reviewed Dec 17, 2025

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Dec 17, 2025

Apply suggestions from code review

b9ebff4

Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 17, 2025

zanmato1984 added 6 commits December 17, 2025 17:38

Address comment: Replace visit_data_inline including with smaller one.

05634e7

Address comment: Make WrapResults protected.

1ae0e81

Address comment: Fix comment.

ceb013f

Address comment: Use else for take.

cf8a095

Revert "Address comment: Make WrapResults protected."

66f8b11

This reverts commit db970f1e89722dcf65b09553c51ed1b1a4ede3ce.

Address comment: Replace O(N) search for number of indices with O(LogN).

956c4f8

zanmato1984 commented Dec 17, 2025

View reviewed changes

		using ArrayKernelSelectiveExec = Status ()(KernelContext, const ExecSpan&,
		const SelectionVectorSpan&, ExecResult*);

GH-47376: [C++][Compute] Support selective execution for kernels #47377

Are you sure you want to change the base?

GH-47376: [C++][Compute] Support selective execution for kernels #47377

Uh oh!

Conversation

zanmato1984 commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

zanmato1984 commented Sep 13, 2025

Uh oh!

zanmato1984 commented Sep 13, 2025

Uh oh!

zanmato1984 commented Sep 13, 2025

Uh oh!

zanmato1984 commented Oct 14, 2025

Uh oh!

zanmato1984 commented Oct 30, 2025

Uh oh!

pitrou commented Oct 30, 2025

Uh oh!

zanmato1984 commented Oct 30, 2025

Uh oh!

zanmato1984 commented Nov 20, 2025

Uh oh!

zanmato1984 commented Dec 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

felipecrv commented Dec 17, 2025

Uh oh!

pitrou commented Dec 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

zanmato1984 commented Aug 20, 2025 •

edited

Loading