-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
GPU与NPU在多模态基准测试下的对比实验
二者均使用Qwen2.5-VL-7B-Instruct模型进行多模态测试,在vision arena数据集下的200次prompt测试结果
GPU
- paramepter
hardware:A100
env: vllm(latest)
- scripts
# serving
vllm serve /root/autodl-tmp/hfmodels/Qwen/Qwen2.5-VL-7B-Instruct --trust-remote-code --max-model-len 8192
# client
python3 benchmarks/benchmark_serving.py --model /root/autodl-tmp/hfmodels/Qwen/Qwen2.5-VL-7B-Instruct --backend openai-chat --endpoint /v1/chat/completions --dataset-name hf --dataset-path lmarena-ai/vision-arena-bench-v0.1 --hf-split train --num-prompts 200 --request-rate 1
# circle 1
============ Serving Benchmark Result ============
Successful requests: 200
Benchmark duration (s): 199.16
Total input tokens: 20026
Total generated tokens: 21972
Request throughput (req/s): 1.00
Output token throughput (tok/s): 110.32
Total Token throughput (tok/s): 210.87
---------------Time to First Token----------------
Mean TTFT (ms): 434.19
Median TTFT (ms): 297.93
P99 TTFT (ms): 2995.34
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 22.11
Median TPOT (ms): 20.30
P99 TPOT (ms): 50.20
---------------Inter-token Latency----------------
Mean ITL (ms): 22.11
Median ITL (ms): 16.04
P99 ITL (ms): 210.65
==================================================
# circle 2
============ Serving Benchmark Result ============
Successful requests: 200
Benchmark duration (s): 199.08
Total input tokens: 20026
Total generated tokens: 22065
Request throughput (req/s): 1.00
Output token throughput (tok/s): 110.83
Total Token throughput (tok/s): 211.42
---------------Time to First Token----------------
Mean TTFT (ms): 257.25
Median TTFT (ms): 210.87
P99 TTFT (ms): 1222.23
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 20.32
Median TPOT (ms): 19.01
P99 TPOT (ms): 33.34
---------------Inter-token Latency----------------
Mean ITL (ms): 19.78
Median ITL (ms): 15.95
P99 ITL (ms): 147.67
==================================================
# circle 3
============ Serving Benchmark Result ============
Successful requests: 200
Benchmark duration (s): 199.09
Total input tokens: 20026
Total generated tokens: 22043
Request throughput (req/s): 1.00
Output token throughput (tok/s): 110.72
Total Token throughput (tok/s): 211.30
---------------Time to First Token----------------
Mean TTFT (ms): 261.26
Median TTFT (ms): 217.72
P99 TTFT (ms): 1236.31
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 20.48
Median TPOT (ms): 19.08
P99 TPOT (ms): 35.79
---------------Inter-token Latency----------------
Mean ITL (ms): 19.90
Median ITL (ms): 15.95
P99 ITL (ms): 151.85
==================================================
# circle 3
============ Serving Benchmark Result ============
Successful requests: 200
Benchmark duration (s): 199.08
Total input tokens: 20026
Total generated tokens: 22065
Request throughput (req/s): 1.00
Output token throughput (tok/s): 110.84
Total Token throughput (tok/s): 211.43
---------------Time to First Token----------------
Mean TTFT (ms): 258.26
Median TTFT (ms): 207.43
P99 TTFT (ms): 1241.29
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 20.29
Median TPOT (ms): 18.95
P99 TPOT (ms): 35.63
---------------Inter-token Latency----------------
Mean ITL (ms): 19.78
Median ITL (ms): 15.93
P99 ITL (ms): 150.24
==================================================
# circle 4
============ Serving Benchmark Result ============
Successful requests: 200
Benchmark duration (s): 199.07
Total input tokens: 20026
Total generated tokens: 21986
Request throughput (req/s): 1.00
Output token throughput (tok/s): 110.44
Total Token throughput (tok/s): 211.04
---------------Time to First Token----------------
Mean TTFT (ms): 259.45
Median TTFT (ms): 210.40
P99 TTFT (ms): 1225.97
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 20.01
Median TPOT (ms): 18.91
P99 TPOT (ms): 33.79
---------------Inter-token Latency----------------
Mean ITL (ms): 19.82
Median ITL (ms): 15.96
P99 ITL (ms): 148.32
==================================================
NPU
- parameters
hardware:910B3
env: vllm vllm-ascend(latest)
- scripts
# serving
vllm serve /root/wl/cache/modelscope/models/Qwen/Qwen2.5-VL-7B-Instruct --trust-remote-code --max-model-len 8192
# client
python3 benchmarks/benchmark_serving.py --model /root/wl/cache/modelscope/models/Qwen/Qwen2.5-VL-7B-Instruct --backend openai-chat --endpoint /v1/chat/completions --dataset-name hf --dataset-path /root/wl/cache/huggingface/datasets/vision-arena-bench-v0.1 --hf-split train --num-prompts 200 --request-rate 1
# circle 1
============ Serving Benchmark Result ============
Successful requests: 200
Benchmark duration (s): 322.66
Total input tokens: 20026
Total generated tokens: 22531
Request throughput (req/s): 0.62
Output token throughput (tok/s): 69.83
Total Token throughput (tok/s): 131.89
---------------Time to First Token----------------
Mean TTFT (ms): 43759.89
Median TTFT (ms): 47906.20
P99 TTFT (ms): 74071.36
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 2275.84
Median TPOT (ms): 1393.64
P99 TPOT (ms): 8850.74
---------------Inter-token Latency----------------
Mean ITL (ms): 1385.07
Median ITL (ms): 496.57
P99 ITL (ms): 1065.41
==================================================
# circle 2
============ Serving Benchmark Result ============
Successful requests: 200
Benchmark duration (s): 304.05
Total input tokens: 20026
Total generated tokens: 22182
Request throughput (req/s): 0.66
Output token throughput (tok/s): 72.96
Total Token throughput (tok/s): 138.82
---------------Time to First Token----------------
Mean TTFT (ms): 27512.14
Median TTFT (ms): 25442.71
P99 TTFT (ms): 56499.32
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 2627.30
Median TPOT (ms): 1510.12
P99 TPOT (ms): 25714.26
---------------Inter-token Latency----------------
Mean ITL (ms): 1370.18
Median ITL (ms): 482.34
P99 ITL (ms): 5503.84
==================================================
# circle 3
============ Serving Benchmark Result ============
Successful requests: 200
Benchmark duration (s): 334.35
Total input tokens: 20026
Total generated tokens: 22410
Request throughput (req/s): 0.60
Output token throughput (tok/s): 67.03
Total Token throughput (tok/s): 126.92
---------------Time to First Token----------------
Mean TTFT (ms): 47657.65
Median TTFT (ms): 50867.66
P99 TTFT (ms): 82413.10
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 2627.71
Median TPOT (ms): 1542.69
P99 TPOT (ms): 48765.68
---------------Inter-token Latency----------------
Mean ITL (ms): 1448.25
Median ITL (ms): 512.19
P99 ITL (ms): 1077.13
==================================================
# circle 4
============ Serving Benchmark Result ============
Successful requests: 200
Benchmark duration (s): 325.67
Total input tokens: 20026
Total generated tokens: 22254
Request throughput (req/s): 0.61
Output token throughput (tok/s): 68.33
Total Token throughput (tok/s): 129.82
---------------Time to First Token----------------
Mean TTFT (ms): 41784.21
Median TTFT (ms): 45363.90
P99 TTFT (ms): 70828.74
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 2574.96
Median TPOT (ms): 1517.36
P99 TPOT (ms): 46448.45
---------------Inter-token Latency----------------
Mean ITL (ms): 1453.23
Median ITL (ms): 541.15
P99 ITL (ms): 1150.94
==================================================
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels