-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
Here is a 8 cards 4090 server nvidia topo -m result:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X SYS SYS SYS SYS SYS SYS SYS SYS SYS 0-55,112-167 0 N/A
GPU1 SYS X SYS SYS SYS SYS SYS SYS SYS SYS 0-55,112-167 0 N/A
GPU2 SYS SYS X SYS SYS SYS SYS SYS SYS SYS 0-55,112-167 0 N/A
GPU3 SYS SYS SYS X SYS SYS SYS SYS SYS SYS 0-55,112-167 0 N/A
GPU4 SYS SYS SYS SYS X SYS SYS SYS SYS SYS 56-111,168-223 1 N/A
GPU5 SYS SYS SYS SYS SYS X SYS SYS SYS SYS 56-111,168-223 1 N/A
GPU6 SYS SYS SYS SYS SYS SYS X SYS PHB PHB 56-111,168-223 1 N/A
GPU7 SYS SYS SYS SYS SYS SYS SYS X SYS SYS 56-111,168-223 1 N/A
NIC0 SYS SYS SYS SYS SYS SYS PHB SYS X PIX
NIC1 SYS SYS SYS SYS SYS SYS PHB SYS PIX X
All of the Connectivity is SYS. But in llm inference situation, when scheduling two cards with different CPU Affinity together, the inference speed will be very slower than one card.
More info can be found here vllm-project/vllm#1838
Metadata
Metadata
Assignees
Labels
No labels