-
Notifications
You must be signed in to change notification settings - Fork 421
add vpdpbusd avx512 intrinsic
#4776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for contributing to Miri! A reviewer will take a look at your PR, typically within a week or two. |
src/shims/x86/mod.rs
Outdated
| let intermediate = i32::from(i16::from(a1).wrapping_mul(i16::from(b1 as i8))) | ||
| .wrapping_add(i32::from(i16::from(a2).wrapping_mul(i16::from(b2 as i8)))) | ||
| .wrapping_add(i32::from(i16::from(a3).wrapping_mul(i16::from(b3 as i8)))) | ||
| .wrapping_add(i32::from(i16::from(a4).wrapping_mul(i16::from(b4 as i8)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the as i8 is intentional here so that sign extension is used. So try_from would not work here, how does miri generally handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the type of everything involved here, but there is cast_signed/cast_unsigned -- does that suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I am slightly concerned about slowly growing a huge avx512 file that nobody has an overview of any more.^^ But as long as there's a clear motivation in the form of a core ecosystem crate, I hope that will naturally limit the scope of what we have to support.
| /// <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_dpbusd_epi32> | ||
| /// <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_dpbusd_epi32> | ||
| /// <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm512_dpbusd_epi32> | ||
| fn vpdpbusd<'tcx>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is only used in avx512, please move the function to that file.
| interp_ok(()) | ||
| } | ||
|
|
||
| /// Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed | |
| /// Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in `a` with corresponding signed |
same for all the other references to variable names in the doc comment
| let intermediate = i32::from(i16::from(a1).wrapping_mul(i16::from(b1.cast_signed()))) | ||
| .wrapping_add(i32::from(i16::from(a2).wrapping_mul(i16::from(b2.cast_signed())))) | ||
| .wrapping_add(i32::from(i16::from(a3).wrapping_mul(i16::from(b3.cast_signed())))) | ||
| .wrapping_add(i32::from(i16::from(a4).wrapping_mul(i16::from(b4.cast_signed())))); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should find a way to make this more readable...
As a start, why are you mixing i16 and i32? And why wrapping_mul? multiplying two i8 as an i16 cannot overflow, right? Same for add. If things can never overflow, please use the strict operations.
Also, I think it would make sense to let-bind the 4 multiplications. Maybe that could even be written in a loop, e.g. via from_fn?
| } | ||
|
|
||
| #[target_feature(enable = "avx512vnni")] | ||
| unsafe fn test_avx512vnni() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned that this is aiming to hit a bunch of overflow and truncation cases. Please add comments pointing that out.
|
Reminder, once the PR becomes ready for a review, use |
This intrinsic is useful for the adler32 checksum algorithm.
The test attempts to hit a bunch of overflow and truncation cases, and I've validated it on real hardware.