This repository provides the compilation scripts for LAMMPS (11 Feb 2026) with the KOKKOS package enabled, comparing execution efficiency and performance between AMD MI210 and NVIDIA H100.
For both systems, the Kokkos accelerator is activated at runtime with -k on g 1 -sf kk and the same input script (in.gpu.lj), allowing a direct comparison of execution efficiency and performance between MI210 and H100 under equivalent Kokkos configurations.
| Test Case | AMD MI210 | NV H100 (Nano 5) |
|---|---|---|
| in.gpu.lj | 2 hr 30 min 16 s | 2 hr 4 min 37 s |
| GPU | Timesteps/s | M atom‑steps/s |
|---|---|---|
| AMD MI210 | 199.7 | 99.8 |
| NVIDIA H100 | 240.8 | 120.4 |
The NVIDIA H100 outperforms the AMD MI210, but the gap is smaller than raw peak FLOP and memory‑bandwidth ratios would suggest.
-
On the MI210, corresponding to 199.7 timesteps/s and 99.8 M atom‑steps/s.
-
On the H100, giving 240.8 timesteps/s and 120.4 M atom‑steps/s.
-
This translates to roughly a 20% higher throughput on H100 (timesteps/s and M atom‑steps/s), despite a significantly larger theoretical performance advantage.
In the MPI timing breakdown, the pair‑force kernel accounting for only about one‑third of the total. This indicates that GPU compute is not the only limiting factor; host‑side work, neighbor‑list handling, and Kokkos overheads also play a substantial role.
- The NVIDIA H100 delivers about 20% higher Kokkos performance than the AMD MI210 on this LJ benchmark, despite a much larger peak‑spec advantage.
- Both runs are dominated by host‑side and Kokkos overhead (Comm/Modify) rather than pure pair‑force compute, so GPU arithmetic throughput is not the only limiting factor.