-
-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Have you considered using the core::simd module for simd? I went thru the effort of porting the decode_two_unsafe function, and it seems to have the same performance for me.
Here's a godbolt link with a simplified implementation of it using both the core::simd and core::arch::x86_64 modules. The core::simd implementation actually has no unsafe code besides the transmute, although it does expect a [u8; 16] as input. It also compiles on other platforms, since the core::simd module is meant to be portable. I haven't tested it on anything other than x86_64, but it's supposed to act exactly the same.
I did make my port of the function on the actual library, so I could benchmark it, but the code is really messy, so I only sent the godbolt link for now. I'll probably put it in a branch of my fork, but again, it's really messy and just thrown together.