Skip to content

PoC: safe SIMD#119

Draft
Shnatsel wants to merge 25 commits intoawxkee:masterfrom
Shnatsel:safer-simd
Draft

PoC: safe SIMD#119
Shnatsel wants to merge 25 commits intoawxkee:masterfrom
Shnatsel:safer-simd

Conversation

@Shnatsel
Copy link
Contributor

Removes all unsafe from avx/rgb_xyz.rs as a proof of concept. The only remaining unsafe in that file is for calling a function with a #[target_feature] annotation.

Starting with Rust 1.87 platform intrinsics are no longer unsafe to call. Load/store intrinsics still are because they operate on raw pointers, but the safe_unaligned_simd crate provides safe wrappers around those.

This changes aligned store intrinsics into unaligned ones, but the compiler will turn them right back into aligned store instructions if it can tell the address is always aligned, which should apply here.

Dropping the now-unnecessary block created lots of noisy whitespace changes, but that is split into its own commit to ease review.

This is meant as a starting point, I am happy to help convert the rest of the code to this style if you are OK with the basic direction.

@Shnatsel
Copy link
Contributor Author

Clippy complains about code I didn't touch.

@awxkee
Copy link
Owner

awxkee commented Nov 11, 2025

Clippy complains about code I didn't touch.

You bumped the MSRV, which introduced new rules 🙂 The CI build pipeline now requires updating the MSRV version as well. Since image depends on this crate, it may also require bumping the MSRV in image again. @197g

I'm happy with the SIMD changes as long as they meet these two points:

  1. The performance drift after the changes is less than ~3% for both RGB -> RGB and RGBA->RGBA paths. The code here is tricky and may unexpectedly shift performance by up to 30% due to lane reordering.
  2. It doesn’t introduce too much abstraction. This seems fine, as it adds only one dependency.

@Shnatsel
Copy link
Contributor Author

image does not require the very latest version of moxcms and its CI explicitly uses the oldest version of every dependency for MSRV testing. Also, the previous MSRV already has the Cargo MSRV-aware resolver, so the MSRV bump is not a concern for image.

@Shnatsel
Copy link
Contributor Author

That sounds good to me!

I'll convert this PR to draft until I have converted at least a good chunk and have placated Clippy.

@Shnatsel Shnatsel marked this pull request as draft November 11, 2025 14:06
@Shnatsel
Copy link
Contributor Author

I think I've picked a poor starting point in rgb-xyz because there functions aren't tested or benchmarked. Because of that it's hard to tell if I broke anything or if there were any performance changes. And since they're not instantiated and used in a binary I cannot easily view the assembly either.

@awxkee
Copy link
Owner

awxkee commented Nov 11, 2025

You can change this method to return always false and set TransformOptions { prefer_fixed_point: false }.

EDIT: return always false

@awxkee
Copy link
Owner

awxkee commented Nov 11, 2025

Invocation logic goes as so:

  1. Check if all TRC curves are the same.
  2. Check if fixed point allowed.

If both statements are true then will be invoked Q path with Opt suffix.
If fixed point is not allowed, then the executor does not have the Q suffix.
If the TRC curves are not all the same the executors have neither the Q nor the Opt suffix.

@197g
Copy link
Collaborator

197g commented Nov 11, 2025

To confirm, as long as the rust-version annotation in Cargo.toml is accurate we won't have a problem in image (or any downstream using resolver 2). It'd be different if this were a new semer incompatible version. Cargo takes care of it as long as we can use the downstream dependency definition unchanged.

Comment on lines +286 to +288
let temporary_first_half: &mut [u16; 8] =
(&mut temporary0.0[..8]).try_into().unwrap();
_mm_storeu_si128(temporary_first_half, zx);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let temporary_first_half: &mut [u16; 8] =
(&mut temporary0.0[..8]).try_into().unwrap();
_mm_storeu_si128(temporary_first_half, zx);
_mm_storeu_si128(temporary0.0.first_chunk_mut::<8>().unwrap(), zx);

This should work and avoid any chance of creating a temporary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants