Skip to content

H2D/D2H bandwidth identical across NUMA nodes on dual-CPU MI210 system — expected behavior? #129

@VincentXWD

Description

@VincentXWD

Platform: Dual-socket AMD EPYC 9554 + 10× MI210
NUMA affinity: CPU0 bound to GPUs 2–5; CPU1 bound to GPUs 6–9
Tool: rocm-bandwidth-test v2.6.0

Observation

Inter-Device Numa Distance shows clear asymmetry (e.g., CPU0→GPU2–5 distance 20 vs. CPU0→GPU6–9 distance 52/72; flipped for CPU1), confirming topology/affinity differences.
However, Unidirectional/Bidirectional H2D/D2H bandwidth is nearly identical from either NUMA node to any GPU: ~28 GB/s (uni), ~45 GB/s (bi), which looks like PCIe Gen4 x16 saturation.

Questions

  1. Is this expected because large-block H2D/D2H tests are SDMA/PCIe limited, thus NUMA effects are masked by the PCIe link (Gen4 x16), yielding identical results regardless of CPU node?

  2. How does rocm-bandwidth-test allocate/register pinned memory? Is it bound to a specific NUMA node, and is there a way to force membind/first-touch to explicitly test cross-socket behavior?

  3. Are there recommended multi-stream/small-message/concurrent configurations to amplify NUMA differences (e.g., where host DRAM/UPI/XGMI starts to dominate) so we can observe measurable impact beyond PCIe saturation?

============================ ROCm System Management Interface ============================
================================ Weight between two GPUs =================================
       GPU0         GPU1         GPU2         GPU3         GPU4         GPU5         GPU6         GPU7         
GPU0   0            15           15           15           72           72           72           72           
GPU1   15           0            15           15           72           72           72           72           
GPU2   15           15           0            15           72           72           72           72           
GPU3   15           15           15           0            72           72           72           72           
GPU4   72           72           72           72           0            15           15           15           
GPU5   72           72           72           72           15           0            15           15           
GPU6   72           72           72           72           15           15           0            15           
GPU7   72           72           72           72           15           15           15           0            

================================= Hops between two GPUs ==================================
       GPU0         GPU1         GPU2         GPU3         GPU4         GPU5         GPU6         GPU7         
GPU0   0            1            1            1            3            3            3            3            
GPU1   1            0            1            1            3            3            3            3            
GPU2   1            1            0            1            3            3            3            3            
GPU3   1            1            1            0            3            3            3            3            
GPU4   3            3            3            3            0            1            1            1            
GPU5   3            3            3            3            1            0            1            1            
GPU6   3            3            3            3            1            1            0            1            
GPU7   3            3            3            3            1            1            1            0            

=============================== Link Type between two GPUs ===============================
       GPU0         GPU1         GPU2         GPU3         GPU4         GPU5         GPU6         GPU7         
GPU0   0            XGMI         XGMI         XGMI         PCIE         PCIE         PCIE         PCIE         
GPU1   XGMI         0            XGMI         XGMI         PCIE         PCIE         PCIE         PCIE         
GPU2   XGMI         XGMI         0            XGMI         PCIE         PCIE         PCIE         PCIE         
GPU3   XGMI         XGMI         XGMI         0            PCIE         PCIE         PCIE         PCIE         
GPU4   PCIE         PCIE         PCIE         PCIE         0            XGMI         XGMI         XGMI         
GPU5   PCIE         PCIE         PCIE         PCIE         XGMI         0            XGMI         XGMI         
GPU6   PCIE         PCIE         PCIE         PCIE         XGMI         XGMI         0            XGMI         
GPU7   PCIE         PCIE         PCIE         PCIE         XGMI         XGMI         XGMI         0            

======================================= Numa Nodes =======================================
GPU[0]          : (Topology) Numa Node: 0
GPU[0]          : (Topology) Numa Affinity: 0
GPU[1]          : (Topology) Numa Node: 0
GPU[1]          : (Topology) Numa Affinity: 0
GPU[2]          : (Topology) Numa Node: 0
GPU[2]          : (Topology) Numa Affinity: 0
GPU[3]          : (Topology) Numa Node: 0
GPU[3]          : (Topology) Numa Affinity: 0
GPU[4]          : (Topology) Numa Node: 1
GPU[4]          : (Topology) Numa Affinity: 1
GPU[5]          : (Topology) Numa Node: 1
GPU[5]          : (Topology) Numa Affinity: 1
GPU[6]          : (Topology) Numa Node: 1
GPU[6]          : (Topology) Numa Affinity: 1
GPU[7]          : (Topology) Numa Node: 1
GPU[7]          : (Topology) Numa Affinity: 1
================================== End of ROCm SMI Log ===================================
          RocmBandwidthTest Version: 2.6.0

          Launch Command is: ./rocm-bandwidth-test (rocm_bandwidth -a + rocm_bandwidth -A)


          Device: 0,  AMD EPYC 9554 64-Core Processor
          Device: 1,  AMD EPYC 9554 64-Core Processor
          Device: 2,  AMD Instinct MI210,  GPU-06c9d21390bc2e70,  05:0.0
          Device: 3,  AMD Instinct MI210,  GPU-8b193c69a6fe8e9b,  08:0.0
          Device: 4,  AMD Instinct MI210,  GPU-bdfdc381baca4d61,  45:0.0
          Device: 5,  AMD Instinct MI210,  GPU-4a4febed074f17f1,  48:0.0
          Device: 6,  AMD Instinct MI210,  GPU-ea00f32afadbf90a,  87:0.0
          Device: 7,  AMD Instinct MI210,  GPU-dda0ddc66290f236,  8a:0.0
          Device: 8,  AMD Instinct MI210,  GPU-e0a81ba81d2a0689,  c8:0.0
          Device: 9,  AMD Instinct MI210,  GPU-42deef71a398b3a1,  cb:0.0

          Inter-Device Access

          D/D       0         1         2         3         4         5         6         7         8         9         

          0         1         1         1         1         1         1         1         1         1         1         

          1         1         1         1         1         1         1         1         1         1         1         

          2         1         1         1         1         1         1         1         1         1         1         

          3         1         1         1         1         1         1         1         1         1         1         

          4         1         1         1         1         1         1         1         1         1         1         

          5         1         1         1         1         1         1         1         1         1         1         

          6         1         1         1         1         1         1         1         1         1         1         

          7         1         1         1         1         1         1         1         1         1         1         

          8         1         1         1         1         1         1         1         1         1         1         

          9         1         1         1         1         1         1         1         1         1         1         


          Inter-Device Numa Distance

          D/D       0         1         2         3         4         5         6         7         8         9         

          0         0         32        20        20        20        20        52        52        52        52        

          1         32        0         52        52        52        52        20        20        20        20        

          2         20        52        0         15        15        15        72        72        72        72        

          3         20        52        15        0         15        15        72        72        72        72        

          4         20        52        15        15        0         15        72        72        72        72        

          5         20        52        15        15        15        0         72        72        72        72        

          6         52        20        72        72        72        72        0         15        15        15        

          7         52        20        72        72        72        72        15        0         15        15        

          8         52        20        72        72        72        72        15        15        0         15        

          9         52        20        72        72        72        72        15        15        15        0         


          Unidirectional copy peak bandwidth GB/s

          D/D       0           1           2           3           4           5           6           7           8           9           

          0         N/A         N/A         28.080      28.083      28.055      28.065      28.031      28.007      28.054      28.033      

          1         N/A         N/A         28.031      28.042      28.057      28.042      28.078      28.099      28.101      28.091      

          2         28.269      28.258      1026.750    39.934      39.926      39.919      28.273      28.294      28.282      28.271      

          3         28.260      28.292      39.957      1020.504    39.839      39.919      28.288      28.277      28.273      28.292      

          4         28.263      28.246      39.900      39.877      1021.747    39.915      28.258      28.267      28.279      28.263      

          5         28.292      28.284      39.953      39.941      39.957      1015.555    28.296      28.286      28.292      28.250      

          6         28.261      28.260      28.252      28.246      28.254      28.242      960.894     39.942      39.927      39.809      

          7         28.258      28.263      28.273      28.261      28.280      28.267      39.976      966.423     39.923      39.915      

          8         28.271      28.239      28.248      28.252      28.250      28.260      39.923      39.930      966.423     39.976      

          9         28.254      28.239      28.233      28.261      28.252      28.248      39.835      39.923      39.965      969.774     


          Bidirectional copy peak bandwidth GB/s

          D/D       0           1           2           3           4           5           6           7           8           9           

          0         N/A         N/A         45.410      45.506      45.190      45.353      45.063      45.165      45.063      45.073      

          1         N/A         N/A         45.250      45.787      45.484      45.070      45.129      45.277      45.398      45.570      

          2         45.410      45.250      N/A         76.531      76.797      76.713      56.051      56.019      56.022      56.029      

          3         45.506      45.787      76.531      N/A         76.755      76.755      55.998      56.009      55.980      56.036      

          4         45.190      45.484      76.797      76.755      N/A         76.566      55.956      56.023      55.993      56.036      

          5         45.353      45.070      76.713      76.755      76.566      N/A         56.009      56.006      56.018      56.033      

          6         45.063      45.129      56.051      55.998      55.956      56.009      N/A         76.496      76.673      76.738      

          7         45.165      45.277      56.019      56.009      56.023      56.006      76.496      N/A         76.720      76.748      

          8         45.063      45.398      56.022      55.980      55.993      56.018      76.673      76.720      N/A         76.539      

          9         45.073      45.570      56.029      56.036      56.036      56.033      76.738      76.748      76.539      N/A         

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions