Simulating graveler battles for the softlock described in Pikasprey's and Shoddycast's videos
A post explaining how I implmented this project can be found on my blog.
Simply download a version matching your machine from releases, but performance will likely be lower due to not having native target compilation, and will be set using half the available threads.
A different executable was compiled for different x86_64 instruction sets(wikipedia), newer sets will have a better performance, but an executable for a newer set than available on the machine will not run.
To compile the code, you need to have rust nightly installed, and then run the performance maximised build command:
cargo +nightly build --profile max+nightly is optional if nightly is the default toolchain on the machine.
the amount of threads used optimally can be different between different CPUs, it is set to half the available threads by default and can be changed directly in the code if testing other values is desired.
The executable will be generated in ./target/max/graveler (on windows it will be graveler.exe)
./target/max/graveler <threads>If the amount of threads to use is not specified, it will be set to the amount of available logical CPUs by default.
Performance was measured using hyperfine
| CPU | Single Thread | Half Threads | All Threads |
|---|---|---|---|
| i7-10750H 6 Cores 12 Threads | 2.78s | 512ms | 531ms |
| Ryzen 7950X3D 16 Cores 32 Threads | 1.78s | 134ms | 117ms |
| 2x Xeon Gold 5420+ 56 Cores 112 Threads | 3.73s | 71.3ms | 69.9ms |
In the CUDA folder, there is a CUDA implementation of a nearly identical algorithm, for running on an Nvidia GPU.
To compile it, you need CUDA installed, and can simply run make to compile both the normal version and the benchmark version.
The benchmark version runs the kernel 50 times as warm-up up and then 1000 more to time it and outputs the average of the 1000 runs.
The output is only for the kernel time and summarizing the kernel results, it does not include the CUDA runtime initialization, which can take significantly longer then the kernel.
| GP U | Average |
|---|---|
| RTX 2070 Mobile Max-Q | 31.51ms |
| RTX 4080 | 6.36ms |