Skip to content

Conversation

@enwillyado
Copy link
Contributor

Fast SIMD calls and secure align for result.

Fast SIMD calls and secure align for result.
Copy link
Collaborator

@fabiangreffrath fabiangreffrath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you! But what was the exact issue with the former code that needed to get fixed?

Copy link
Collaborator

@fabiangreffrath fabiangreffrath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the failing builds.

x = _mm_max_ps(x, _mm_sub_ps((__m128){0, 0, 0, 0}, x));
x = _mm_mul_ps(x, (__m128){sfacfix, sfacfix, sfacfix, sfacfix});
const __m256d x_d = _mm256_load_pd(&xr[cnt]);
__m128 x = _mm256_cvtpd_ps(x_d);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are AVX intrinsics, but this is SSE2 code. This will crash on systems without AVX support.

You need to use two _mm_load_pd and _mm_cvtpd_ps calls and one _mm_movelh_ps.

@enwillyado
Copy link
Contributor Author

enwillyado commented Jun 2, 2025

Looks good, thank you! But what was the exact issue with the former code that needed to get fixed?

The main problem appears when unalligned data is used. The implicit _mm_store_si128 on original *(__m128i*)(xi + cnt) = _mm_cvttps_epi32(x); will crash with SIGSEGV error.

Something like this.
imagen

For make an fix-only PR, I will change this part.

@enwillyado enwillyado closed this Jun 2, 2025
@enwillyado enwillyado reopened this Jun 2, 2025
Potential unaligned memory location.
Only fix unaligned memory access.
Copy link
Collaborator

@fabiangreffrath fabiangreffrath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@fabiangreffrath fabiangreffrath merged commit cb10c59 into knik0:master Jun 2, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants