Skip to content

barr-israel/graveler-sim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Graveler Simulation

Simulating graveler battles for the softlock described in Pikasprey‬'s and Shoddycast's videos

A post explaining how I implmented this project can be found on my blog.

Download

Simply download a version matching your machine from releases, but performance will likely be lower due to not having native target compilation, and will be set using half the available threads.
A different executable was compiled for different x86_64 instruction sets(wikipedia), newer sets will have a better performance, but an executable for a newer set than available on the machine will not run.

Compilation

To compile the code, you need to have rust nightly installed, and then run the performance maximised build command:

cargo +nightly build --profile max

+nightly is optional if nightly is the default toolchain on the machine.

the amount of threads used optimally can be different between different CPUs, it is set to half the available threads by default and can be changed directly in the code if testing other values is desired.

The executable will be generated in ./target/max/graveler (on windows it will be graveler.exe)

Usage

./target/max/graveler <threads>

If the amount of threads to use is not specified, it will be set to the amount of available logical CPUs by default.

Performance

Performance was measured using hyperfine

CPU Single Thread Half Threads All Threads
i7-10750H 6 Cores 12 Threads 2.78s 512ms 531ms
Ryzen 7950X3D 16 Cores 32 Threads 1.78s 134ms 117ms
2x Xeon Gold 5420+ 56 Cores 112 Threads 3.73s 71.3ms 69.9ms

GPU Implementation

In the CUDA folder, there is a CUDA implementation of a nearly identical algorithm, for running on an Nvidia GPU.
To compile it, you need CUDA installed, and can simply run make to compile both the normal version and the benchmark version.
The benchmark version runs the kernel 50 times as warm-up up and then 1000 more to time it and outputs the average of the 1000 runs.
The output is only for the kernel time and summarizing the kernel results, it does not include the CUDA runtime initialization, which can take significantly longer then the kernel.

GP U Average
RTX 2070 Mobile Max-Q 31.51ms
RTX 4080 6.36ms

About

Simulating gravelers battles for the softlock described in Pikasprey‬'s and Shoddycast's videos

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors