This repository provides the compilation scripts for SeisSol (v1.3.1), comparing execution efficiency and performance between AMD MI210 and NVIDIA V100.
The experiment focus on two test cases: tpv33 and Turkey.
The AMD platform utilized MI210 GPUs, while the NV platform (Taiwania 2) utilized V100 GPUs.
| Test Case | AMD MI210 | NV V100 (Taiwania 2) |
|---|---|---|
| tpv33 | 4 min 41.23 s (w/ Hip Graph) 10 min 29.85 s (w/o Hip Graph) |
23 min 24.52 s (w/ Cuda Graph) |
| Turkey | OOM (Plasticity = 1) 1 h 41 min 40 s (Plasticity = 0) |
> 2 hrs (Unified Memory, Plasticity = 1) Segmentation Fault (Separated Memory, Plasticity = 0) |
The execution efficiency is heavily dictated by synchronization and kernel launch overhead.
cudaStreamSynchronizeaccounted for 86.1% of total CUDA API time.- MPI communication was negligible, totaling less than 2 seconds.
- Utilizing CUDA/HIP Graphs effectively reducing the CPU-to-GPU submission overhead.
- The AMD MI210 demonstrated superior raw performance over the NV V100.
- The
Turkeytest case highlighted significant memory constraints. The V100 suffered from OOM errors when plasticity was enabled. On AMD platforms, Unified Memory was required to prevent illegal memory access. - Since scientific computing relies heavily on GEMM, cuBLAS parameter tuning or converting operations to GEMM is critical for future performance gains.