Multimodal Model Benchmarks

### GPU与NPU在多模态基准测试下的对比实验
二者均使用[Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)模型进行多模态测试，在[vision arena](https://huggingface.co/datasets/lmarena-ai/vision-arena-bench-v0.1)数据集下的200次prompt测试结果

#### GPU
- paramepter
```
hardware:A100
env: vllm(latest)
```
- scripts
```shell
# serving 
vllm serve /root/autodl-tmp/hfmodels/Qwen/Qwen2.5-VL-7B-Instruct --trust-remote-code --max-model-len 8192
# client
python3 benchmarks/benchmark_serving.py --model /root/autodl-tmp/hfmodels/Qwen/Qwen2.5-VL-7B-Instruct --backend openai-chat --endpoint /v1/chat/completions --dataset-name hf --dataset-path lmarena-ai/vision-arena-bench-v0.1 --hf-split train --num-prompts 200 --request-rate 1
# circle 1
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  199.16    
Total input tokens:                      20026     
Total generated tokens:                  21972     
Request throughput (req/s):              1.00      
Output token throughput (tok/s):         110.32    
Total Token throughput (tok/s):          210.87    
---------------Time to First Token----------------
Mean TTFT (ms):                          434.19    
Median TTFT (ms):                        297.93    
P99 TTFT (ms):                           2995.34   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          22.11     
Median TPOT (ms):                        20.30     
P99 TPOT (ms):                           50.20     
---------------Inter-token Latency----------------
Mean ITL (ms):                           22.11     
Median ITL (ms):                         16.04     
P99 ITL (ms):                            210.65    
==================================================

# circle 2
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  199.08    
Total input tokens:                      20026     
Total generated tokens:                  22065     
Request throughput (req/s):              1.00      
Output token throughput (tok/s):         110.83    
Total Token throughput (tok/s):          211.42    
---------------Time to First Token----------------
Mean TTFT (ms):                          257.25    
Median TTFT (ms):                        210.87    
P99 TTFT (ms):                           1222.23   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          20.32     
Median TPOT (ms):                        19.01     
P99 TPOT (ms):                           33.34     
---------------Inter-token Latency----------------
Mean ITL (ms):                           19.78     
Median ITL (ms):                         15.95     
P99 ITL (ms):                            147.67    
==================================================

# circle 3
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  199.09    
Total input tokens:                      20026     
Total generated tokens:                  22043     
Request throughput (req/s):              1.00      
Output token throughput (tok/s):         110.72    
Total Token throughput (tok/s):          211.30    
---------------Time to First Token----------------
Mean TTFT (ms):                          261.26    
Median TTFT (ms):                        217.72    
P99 TTFT (ms):                           1236.31   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          20.48     
Median TPOT (ms):                        19.08     
P99 TPOT (ms):                           35.79     
---------------Inter-token Latency----------------
Mean ITL (ms):                           19.90     
Median ITL (ms):                         15.95     
P99 ITL (ms):                            151.85    
==================================================

# circle 3
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  199.08    
Total input tokens:                      20026     
Total generated tokens:                  22065     
Request throughput (req/s):              1.00      
Output token throughput (tok/s):         110.84    
Total Token throughput (tok/s):          211.43    
---------------Time to First Token----------------
Mean TTFT (ms):                          258.26    
Median TTFT (ms):                        207.43    
P99 TTFT (ms):                           1241.29   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          20.29     
Median TPOT (ms):                        18.95     
P99 TPOT (ms):                           35.63     
---------------Inter-token Latency----------------
Mean ITL (ms):                           19.78     
Median ITL (ms):                         15.93     
P99 ITL (ms):                            150.24    
==================================================

# circle 4
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  199.07    
Total input tokens:                      20026     
Total generated tokens:                  21986     
Request throughput (req/s):              1.00      
Output token throughput (tok/s):         110.44    
Total Token throughput (tok/s):          211.04    
---------------Time to First Token----------------
Mean TTFT (ms):                          259.45    
Median TTFT (ms):                        210.40    
P99 TTFT (ms):                           1225.97   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          20.01     
Median TPOT (ms):                        18.91     
P99 TPOT (ms):                           33.79     
---------------Inter-token Latency----------------
Mean ITL (ms):                           19.82     
Median ITL (ms):                         15.96     
P99 ITL (ms):                            148.32    
==================================================

```

#### NPU
- parameters
```
hardware:910B3
env: vllm  vllm-ascend(latest)
```
- scripts
```
# serving
vllm serve /root/wl/cache/modelscope/models/Qwen/Qwen2.5-VL-7B-Instruct --trust-remote-code --max-model-len 8192
# client
python3 benchmarks/benchmark_serving.py --model /root/wl/cache/modelscope/models/Qwen/Qwen2.5-VL-7B-Instruct --backend openai-chat --endpoint /v1/chat/completions --dataset-name hf --dataset-path /root/wl/cache/huggingface/datasets/vision-arena-bench-v0.1 --hf-split train --num-prompts 200 --request-rate 1
# circle 1
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  322.66    
Total input tokens:                      20026     
Total generated tokens:                  22531     
Request throughput (req/s):              0.62      
Output token throughput (tok/s):         69.83     
Total Token throughput (tok/s):          131.89    
---------------Time to First Token----------------
Mean TTFT (ms):                          43759.89  
Median TTFT (ms):                        47906.20  
P99 TTFT (ms):                           74071.36  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          2275.84   
Median TPOT (ms):                        1393.64   
P99 TPOT (ms):                           8850.74   
---------------Inter-token Latency----------------
Mean ITL (ms):                           1385.07   
Median ITL (ms):                         496.57    
P99 ITL (ms):                            1065.41   
==================================================

# circle 2
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  304.05    
Total input tokens:                      20026     
Total generated tokens:                  22182     
Request throughput (req/s):              0.66      
Output token throughput (tok/s):         72.96     
Total Token throughput (tok/s):          138.82    
---------------Time to First Token----------------
Mean TTFT (ms):                          27512.14  
Median TTFT (ms):                        25442.71  
P99 TTFT (ms):                           56499.32  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          2627.30   
Median TPOT (ms):                        1510.12   
P99 TPOT (ms):                           25714.26  
---------------Inter-token Latency----------------
Mean ITL (ms):                           1370.18   
Median ITL (ms):                         482.34    
P99 ITL (ms):                            5503.84   
==================================================

# circle 3
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  334.35    
Total input tokens:                      20026     
Total generated tokens:                  22410     
Request throughput (req/s):              0.60      
Output token throughput (tok/s):         67.03     
Total Token throughput (tok/s):          126.92    
---------------Time to First Token----------------
Mean TTFT (ms):                          47657.65  
Median TTFT (ms):                        50867.66  
P99 TTFT (ms):                           82413.10  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          2627.71   
Median TPOT (ms):                        1542.69   
P99 TPOT (ms):                           48765.68  
---------------Inter-token Latency----------------
Mean ITL (ms):                           1448.25   
Median ITL (ms):                         512.19    
P99 ITL (ms):                            1077.13   
==================================================

# circle 4
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  325.67    
Total input tokens:                      20026     
Total generated tokens:                  22254     
Request throughput (req/s):              0.61      
Output token throughput (tok/s):         68.33     
Total Token throughput (tok/s):          129.82    
---------------Time to First Token----------------
Mean TTFT (ms):                          41784.21  
Median TTFT (ms):                        45363.90  
P99 TTFT (ms):                           70828.74  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          2574.96   
Median TPOT (ms):                        1517.36   
P99 TPOT (ms):                           46448.45  
---------------Inter-token Latency----------------
Mean ITL (ms):                           1453.23   
Median ITL (ms):                         541.15    
P99 ITL (ms):                            1150.94   
==================================================
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal Model Benchmarks #2

GPU与NPU在多模态基准测试下的对比实验

GPU

NPU

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multimodal Model Benchmarks #2

Description

GPU与NPU在多模态基准测试下的对比实验

GPU

NPU

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions