State of the art comparison of chess sliding piece algorithms on the GPU It seems that for gpu having no memory lookup at all yields a performance of around 60 Billion lookups/second. This is very impressive since using 32 threads on an 5950X yields a performance of around 13 Billion Queens/second.
NVIDIA GeForce RTX 3080
| Name | Performance [MQueens/s] |
|---|---|
| Black Magic - Fixed shift | 6958.00 |
| QBB Algo | 58959.55 |
| Bob Lookup | 1635.08 |
| Kogge Stone | 39972.16 |
| Hyperbola Quiescence | 16260.91 |
| Switch Lookup | 4425.89 |
| Slide Arithm | 18508.00 |
| Pext Lookup | 16821.82 |
| SISSY Lookup | 8050.17 |
| Hypercube Alg | 1304.38 |
| Dumb 7 Fill | 21842.60 |
| Obstruction Difference | 59202.99 |
| Leorik | 55653.71 |
| SBAMG o^(o-3cbn) | 59564.33 |
| NO HEADACHE | 27982.63 |
| AVX Branchless Shift | 28124.91 |
| Slide Arithmetic Inline | 61837.82 |