Perf: Vectorize check_bounds(2x speedup) by gstvg · Pull Request #8966 · apache/arrow-rs

gstvg · 2025-12-08T05:33:43Z

Which issue does this PR close?

Part of #279, related to #8879

Rationale for this change

take check bounds i32 512
                        time:   [338.84 ns 339.71 ns 340.89 ns]
                        change: [−55.056% −54.843% −54.650%] (p = 0.00 < 0.05)
                        Performance has improved.

take check bounds i32 1024
                        time:   [548.47 ns 551.23 ns 554.47 ns]
                        change: [−63.999% −63.813% −63.608%] (p = 0.00 < 0.05)
                        Performance has improved.

take check bounds i32 8192
                        time:   [3.4934 µs 3.5004 µs 3.5116 µs]
                        change: [−68.252% −68.158% −68.050%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

Instead of checking index by index, and error-branching individually on them, check all indices for the presence of any out-of-bounds index, and only then, in the slow-path, find the exact out-of-bounds index

Note how LLVM vectorizes the happy path into a very tight loop:

https://rust.godbolt.org/z/MhY4cP6WG

Are these changes tested?

Existing tests already cover the changed function

Are there any user-facing changes?

No

gstvg · 2025-12-08T05:34:17Z

@Dandandan

gstvg · 2025-12-08T05:39:10Z

-                    "Array index out of bounds, cannot get item at index {ix} from {len} entries"
-                )));
+        let in_bounds = indices.values().iter().fold(true, |in_bounds, &i| {
+            in_bounds & (i >= T::Native::ZERO) & (i < len)


Even if today check_bounds is only called with unsigned indices, ArrowPrimitiveType includes signed types, which can be used here in the future, so we have to check this too. Happily, LLVM will optimize this out. If this function get's restricted to a hypothetical ArrowUnsignedInteger or ArrowIndexType, this can be removed

gstvg · 2025-12-08T05:51:53Z

+                        "Array index out of bounds, cannot get item at index {index} from {len} entries"
+                    )));
+                }
            }


This heavily optimizes for the happy path, and make the error-path slow. If a more balanced approach is desired we can use something like this:

pub fn check_bounds<T: ArrowPrimitiveType>( len: usize, indices: &PrimitiveArray<T>, ) -> Result<(), ArrowError> where T::Native: Display, { // omitted if indices.null_count() > 0 { // omitted } else { let chunks = indices.values().chunks_exact(64); let remainder = chunks.remainder(); for chunk in chunks { let chunk: &[T::Native; 64] = chunk.try_into().unwrap(); // unwrap is optimized out let in_bounds = chunk.iter().fold(true, |in_bounds, &i| { in_bounds & (i >= T::Native::ZERO) & (i < len) }); if !in_bounds { return Err(out_of_bounds_index(chunk, len)); } } for &index in remainder { if index < T::Native::ZERO || index >= len { return Err(ArrowError::ComputeError(format!( "Array index out of bounds, cannot get item at index {index} from {len} entries" ))); } } Ok(()) } } #[inline(never)] fn out_of_bounds_index<T: ArrowNativeType>(indices: &[T], len: T) -> ArrowError { todo!() }

Dandandan · 2025-12-10T12:39:51Z

Thanks @gstvg

perf: vectorize check_bounds

ef912d5

github-actions bot added the arrow Changes to the arrow crate label Dec 8, 2025

gstvg commented Dec 8, 2025

View reviewed changes

Dandandan approved these changes Dec 8, 2025

View reviewed changes

Dandandan merged commit 5375411 into apache:main Dec 10, 2025
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: Vectorize check_bounds(2x speedup)#8966

Perf: Vectorize check_bounds(2x speedup)#8966
Dandandan merged 1 commit intoapache:mainfrom
gstvg:perf_vectorize_check_bounds

gstvg commented Dec 8, 2025

Uh oh!

gstvg commented Dec 8, 2025

Uh oh!

gstvg Dec 8, 2025

Uh oh!

gstvg Dec 8, 2025

Uh oh!

Uh oh!

Dandandan commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gstvg commented Dec 8, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

gstvg commented Dec 8, 2025

Uh oh!

gstvg Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

gstvg Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Dandandan commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants