如何用llm-benchmark测试xinference起来的qwen3

我的qwen3在xinfercne启动，API格式为：
http://XX.XX.XX.XX:9997/qwen3/

按照说明部署llm-benchmar后，发送命令：
python run_benchmarks.py \
    --llm_url "[http://0.0.0.0:9997/qwen3/](http://172.17.111.41:9997/qwen3/)" \
    --model "qwen3" \
    --use_long_context

报错：
2025-05-12 14:50:16,736 - INFO - Starting request 0
2025-05-12 14:50:16,842 - INFO - HTTP Request: POST http://0.0.0.0:9997/qwen3/chat/completions "HTTP/1.1 404 Not Found"
2025-05-12 14:50:16,842 - ERROR - Error during request: Error code: 404 - {'detail': 'Not Found'}
2025-05-12 14:50:16,842 - WARNING - Request 0 failed

可以看到API被加了openai的后缀/chat/completions，但我的本地模型是xinference部署启动的，没有这个后缀。

请问怎么解决。


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

如何用llm-benchmark测试xinference起来的qwen3 #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

如何用llm-benchmark测试xinference起来的qwen3 #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions