`avx_{add,mul,mac,fma}` hang with Nvidia compilers

The `avx_add`, `avx_mul`, `avx_mac`, and `avx_fma` hang when compiled with Nvidia.  Oddly, the `avx_fmac` test seems to run fine.

Backtrace:
```
Thread 5 (Thread 0x155552ebd700 (LWP 1842419)):
#0  0x0000155553f0e5ae in pthread_barrier_wait () from /lib64/libpthread.so.0
#1  0x0000000000403696 in avx_add (args_in=0x15554c000be0) at src/x86/avx.c:43
#2  0x0000000000402f77 in simd_thread (args_in=0x609310) at src/simd.c:38
#3  0x0000155553f071ca in start_thread () from /lib64/libpthread.so.0
#4  0x00001555534918d3 in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x155555522fc0 (LWP 1842402)):
#0  0x0000155553f086cd in __pthread_timedjoin_ex () from /lib64/libpthread.so.0
#1  0x0000000000401b5c in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:177
```
Looking inside, it seems like perhaps `r_max` is overflowing and then somehow becoming zero, which would naturally kill the `r_max := 2*r_max` progression.

It does not seem to depend on the choice of flags (although AFAIK Nvidia is 
rather aggressive in vectorization).

This happened on an AMD EPYC 7H12, but I don't think it's related to AMD instructions.

First guess is that the `_mm256_add_pd()` or something else in the timed loops is a dummy function and runs in zero-time, causing the loop to be zero-time and `r_max` to increase without bound, eventually overflowing.

I really don't have time to look into this now, but this needs to be addressed for any of the Nvidia content to be taken seriously.  At the very least, we could start checking for `r_max` overflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`avx_{add,mul,mac,fma}` hang with Nvidia compilers #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

avx_{add,mul,mac,fma} hang with Nvidia compilers #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`avx_{add,mul,mac,fma}` hang with Nvidia compilers #6