Contiguous access #21984

Jenya705 · 2025-11-30T15:04:09Z

Objective

Enables accessing slices from tables directly via Queries.

Solution

One new trait:

ContiguousQueryData allows to fetch all values from tables all at once (an implementation for &T returns a slice of components in the set table, for &mut T returns a mutable slice of components in the set table as well as a struct with methods to set update ticks (to match the fetch implementation))

Methods contiguous_iter, contiguous_iter_mut and similar in Query and QueryState making possible to iterate using these traits.

Macro QueryData was updated to support contiguous items when contiguous(target) attribute is added (a target can be all, mutable and immutable, refer to the custom_query_param example)

Testing

sparse_set_contiguous_query test verifies that you can't use next_contiguous with sparse set components
test_contiguous_query_data test verifies that returned values are valid
base_contiguous benchmark (file is named iter_simple_contiguous.rs)
base_no_detection benchmark (file is named iter_simple_no_detection.rs)
base_no_detection_contiguous benchmark (file is named iter_simple_no_detection_contiguous.rs)
base_contiguous_avx2 benchmark (file is named iter_simple_contiguous_avx2.rs)

Showcase

Examples contiguous_query, custom_query_param

Example

// - self.0 is a World
// - self.1 is a QueryState
// - velocity is a slice of components with Vec3 inside.
// - position is a data structure which implements Deref/DerefMut and IntoIterator methods to access the slice
// as well as mechanism to update update ticks (which it does automatically on dereference), 
// which may be bypassed via `bypass_change_detection` methods.
for (velocity, mut position) in self.1.contiguous_iter_mut(&mut self.0).unwrap() {
    assert!(velocity.len() == position.len());
    for (v, p) in velocity.iter().zip(position.iter_mut()) {
        p.0 += v.0;
    }
}

Benchmarks

Code for base benchmark:

#[derive(Component, Copy, Clone)]
struct Transform(Mat4);

#[derive(Component, Copy, Clone)]
struct Position(Vec3);

#[derive(Component, Copy, Clone)]
struct Rotation(Vec3);

#[derive(Component, Copy, Clone)]
struct Velocity(Vec3);

pub struct Benchmark<'w>(World, QueryState<(&'w Velocity, &'w mut Position)>);

impl<'w> Benchmark<'w> {
    pub fn new() -> Self {
        let mut world = World::new();

        world.spawn_batch(core::iter::repeat_n(
            (
                Transform(Mat4::from_scale(Vec3::ONE)),
                Position(Vec3::X),
                Rotation(Vec3::X),
                Velocity(Vec3::X),
            ),
            10_000,
        ));

        let query = world.query::<(&Velocity, &mut Position)>();
        Self(world, query)
    }

    #[inline(never)]
    pub fn run(&mut self) {
        for (velocity, mut position) in self.1.iter_mut(&mut self.0) {
            position.0 += velocity.0;
        }
    }
}

Iterating over 10000 entities from a single table and increasing a 3-dimensional vector from component Position by a 3-dimensional vector from component Velocity

Name	Time	Time (AVX2)	Description
base	5.5828 µs	5.5122 µs	Iteration over components
base_contiguous	4.8825 µs	1.8665 µs	Iteration over contiguous chunks
base_contiguous_avx2	2.0740 µs	1.8665 µs	Iteration over contiguous chunks with enforced avx2 optimizations
base_no_detection	4.8065 µs	4.7723 µs	Iteration over components while bypassing change detection through `bypass_change_detection()` method
base_no_detection_contiguous	4.3979 µs	1.5797 µs	Iteration over components without registering update ticks

Using contiguous 'iterator' makes the program a little bit faster and it can be further vectorized to make it even faster

hymm · 2025-12-01T18:04:42Z

How does this pr compare to Implement batched query support #6161?
Am I right in my understanding that some things might not properly vectorize due to alignment issues even if they use as_contiguous_iter?
If a user wanted to work with the standard libraries simd https://doc.rust-lang.org/std/simd/index.html. Ignoring alignment issues, would this pr work with that?

crates/bevy_ecs/src/query/filter.rs

Jenya705 · 2025-12-01T18:49:38Z

How does this pr compare to Implement batched query support #6161?

This pr just enables slices from tables to be returned directly when applicable, it doesn't implement any batches and it doesn't ensure any specific (other than rust's) alignment (yet these slices may be used to apply simd).

Am I right in my understanding that some things might not properly vectorize due to alignment issues even if they use as_contiguous_iter?

This pr doesn't deal with any alignments but (as of my understanding) you can always take sub-slices which would meet your alignment requirements. And just referring to the issue #21861, even without any specific alignment the code gets vectorized.

If a user wanted to work with the standard libraries simd https://doc.rust-lang.org/std/simd/index.html. Ignoring alignment issues, would this pr work with that?

No, the returned slices do not have any specific (other than rust's) alignment requirements.

chengts95 · 2025-12-01T19:52:56Z

The solution looks promising to solve issue #21861.

If you want to use SIMD instructions explicitly, alignment is something you usually have to manage yourself (with an aligned allocator or a peeled prologue). Auto-vectorization won’t “update” the alignment for you – it just uses whatever alignment it can prove and otherwise emits unaligned loads. From that perspective, a contiguous slice is already sufficient; fully aligned SIMD is a separate concern on top of that.

hymm

This is not a full review, but onboard with the general approach in this pr. Overall this is fairly straightforward. I imagine we'll eventually want to have some simd aligned storage, but in the meantime users can probably align their components manually.

github-actions · 2025-12-03T19:19:56Z

You added a new example but didn't add metadata for it. Please update the root Cargo.toml file.

hymm

The recent changes fixed my reservations with this pr. Just some nits left.

crates/bevy_ecs/src/query/fetch.rs