Skip to content

Multimodal Model Benchmarks #2

@Potabk

Description

@Potabk

GPU与NPU在多模态基准测试下的对比实验

二者均使用Qwen2.5-VL-7B-Instruct模型进行多模态测试,在vision arena数据集下的200次prompt测试结果

GPU

  • paramepter
hardware:A100
env: vllm(latest)
  • scripts
# serving 
vllm serve /root/autodl-tmp/hfmodels/Qwen/Qwen2.5-VL-7B-Instruct --trust-remote-code --max-model-len 8192
# client
python3 benchmarks/benchmark_serving.py --model /root/autodl-tmp/hfmodels/Qwen/Qwen2.5-VL-7B-Instruct --backend openai-chat --endpoint /v1/chat/completions --dataset-name hf --dataset-path lmarena-ai/vision-arena-bench-v0.1 --hf-split train --num-prompts 200 --request-rate 1
# circle 1
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  199.16    
Total input tokens:                      20026     
Total generated tokens:                  21972     
Request throughput (req/s):              1.00      
Output token throughput (tok/s):         110.32    
Total Token throughput (tok/s):          210.87    
---------------Time to First Token----------------
Mean TTFT (ms):                          434.19    
Median TTFT (ms):                        297.93    
P99 TTFT (ms):                           2995.34   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          22.11     
Median TPOT (ms):                        20.30     
P99 TPOT (ms):                           50.20     
---------------Inter-token Latency----------------
Mean ITL (ms):                           22.11     
Median ITL (ms):                         16.04     
P99 ITL (ms):                            210.65    
==================================================

# circle 2
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  199.08    
Total input tokens:                      20026     
Total generated tokens:                  22065     
Request throughput (req/s):              1.00      
Output token throughput (tok/s):         110.83    
Total Token throughput (tok/s):          211.42    
---------------Time to First Token----------------
Mean TTFT (ms):                          257.25    
Median TTFT (ms):                        210.87    
P99 TTFT (ms):                           1222.23   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          20.32     
Median TPOT (ms):                        19.01     
P99 TPOT (ms):                           33.34     
---------------Inter-token Latency----------------
Mean ITL (ms):                           19.78     
Median ITL (ms):                         15.95     
P99 ITL (ms):                            147.67    
==================================================

# circle 3
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  199.09    
Total input tokens:                      20026     
Total generated tokens:                  22043     
Request throughput (req/s):              1.00      
Output token throughput (tok/s):         110.72    
Total Token throughput (tok/s):          211.30    
---------------Time to First Token----------------
Mean TTFT (ms):                          261.26    
Median TTFT (ms):                        217.72    
P99 TTFT (ms):                           1236.31   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          20.48     
Median TPOT (ms):                        19.08     
P99 TPOT (ms):                           35.79     
---------------Inter-token Latency----------------
Mean ITL (ms):                           19.90     
Median ITL (ms):                         15.95     
P99 ITL (ms):                            151.85    
==================================================

# circle 3
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  199.08    
Total input tokens:                      20026     
Total generated tokens:                  22065     
Request throughput (req/s):              1.00      
Output token throughput (tok/s):         110.84    
Total Token throughput (tok/s):          211.43    
---------------Time to First Token----------------
Mean TTFT (ms):                          258.26    
Median TTFT (ms):                        207.43    
P99 TTFT (ms):                           1241.29   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          20.29     
Median TPOT (ms):                        18.95     
P99 TPOT (ms):                           35.63     
---------------Inter-token Latency----------------
Mean ITL (ms):                           19.78     
Median ITL (ms):                         15.93     
P99 ITL (ms):                            150.24    
==================================================

# circle 4
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  199.07    
Total input tokens:                      20026     
Total generated tokens:                  21986     
Request throughput (req/s):              1.00      
Output token throughput (tok/s):         110.44    
Total Token throughput (tok/s):          211.04    
---------------Time to First Token----------------
Mean TTFT (ms):                          259.45    
Median TTFT (ms):                        210.40    
P99 TTFT (ms):                           1225.97   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          20.01     
Median TPOT (ms):                        18.91     
P99 TPOT (ms):                           33.79     
---------------Inter-token Latency----------------
Mean ITL (ms):                           19.82     
Median ITL (ms):                         15.96     
P99 ITL (ms):                            148.32    
==================================================

NPU

  • parameters
hardware:910B3
env: vllm  vllm-ascend(latest)
  • scripts
# serving
vllm serve /root/wl/cache/modelscope/models/Qwen/Qwen2.5-VL-7B-Instruct --trust-remote-code --max-model-len 8192
# client
python3 benchmarks/benchmark_serving.py --model /root/wl/cache/modelscope/models/Qwen/Qwen2.5-VL-7B-Instruct --backend openai-chat --endpoint /v1/chat/completions --dataset-name hf --dataset-path /root/wl/cache/huggingface/datasets/vision-arena-bench-v0.1 --hf-split train --num-prompts 200 --request-rate 1
# circle 1
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  322.66    
Total input tokens:                      20026     
Total generated tokens:                  22531     
Request throughput (req/s):              0.62      
Output token throughput (tok/s):         69.83     
Total Token throughput (tok/s):          131.89    
---------------Time to First Token----------------
Mean TTFT (ms):                          43759.89  
Median TTFT (ms):                        47906.20  
P99 TTFT (ms):                           74071.36  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          2275.84   
Median TPOT (ms):                        1393.64   
P99 TPOT (ms):                           8850.74   
---------------Inter-token Latency----------------
Mean ITL (ms):                           1385.07   
Median ITL (ms):                         496.57    
P99 ITL (ms):                            1065.41   
==================================================

# circle 2
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  304.05    
Total input tokens:                      20026     
Total generated tokens:                  22182     
Request throughput (req/s):              0.66      
Output token throughput (tok/s):         72.96     
Total Token throughput (tok/s):          138.82    
---------------Time to First Token----------------
Mean TTFT (ms):                          27512.14  
Median TTFT (ms):                        25442.71  
P99 TTFT (ms):                           56499.32  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          2627.30   
Median TPOT (ms):                        1510.12   
P99 TPOT (ms):                           25714.26  
---------------Inter-token Latency----------------
Mean ITL (ms):                           1370.18   
Median ITL (ms):                         482.34    
P99 ITL (ms):                            5503.84   
==================================================

# circle 3
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  334.35    
Total input tokens:                      20026     
Total generated tokens:                  22410     
Request throughput (req/s):              0.60      
Output token throughput (tok/s):         67.03     
Total Token throughput (tok/s):          126.92    
---------------Time to First Token----------------
Mean TTFT (ms):                          47657.65  
Median TTFT (ms):                        50867.66  
P99 TTFT (ms):                           82413.10  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          2627.71   
Median TPOT (ms):                        1542.69   
P99 TPOT (ms):                           48765.68  
---------------Inter-token Latency----------------
Mean ITL (ms):                           1448.25   
Median ITL (ms):                         512.19    
P99 ITL (ms):                            1077.13   
==================================================

# circle 4
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  325.67    
Total input tokens:                      20026     
Total generated tokens:                  22254     
Request throughput (req/s):              0.61      
Output token throughput (tok/s):         68.33     
Total Token throughput (tok/s):          129.82    
---------------Time to First Token----------------
Mean TTFT (ms):                          41784.21  
Median TTFT (ms):                        45363.90  
P99 TTFT (ms):                           70828.74  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          2574.96   
Median TPOT (ms):                        1517.36   
P99 TPOT (ms):                           46448.45  
---------------Inter-token Latency----------------
Mean ITL (ms):                           1453.23   
Median ITL (ms):                         541.15    
P99 ITL (ms):                            1150.94   
==================================================

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions