-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Problem Description
I tried to run ATOM with amd/DeepSeek-R1-0528-MXFP4-Preview on MI355 with prefix cache enabled. I got an error
If I remove --enable_prefix_caching , then it is able to run.
Model: https://huggingface.co/amd/DeepSeek-R1-0528-MXFP4
Docker: rocm/atom-dev:nightly_202602250219
Server launch:
python -m atom.entrypoints.openai_server \ --model /data/DeepSeek-R1-0528-MXFP4-Preview \ -tp 8 \ --kv_cache_dtype fp8 \ --enable_prefix_caching \ --cudagraph-capture-sizes "[1,2,4,8,16,32]" \ --max-model-len 131072 \ --max-num-batched-tokens 131072 \ --max-num-seqs 32 \ --gpu-memory-utilization 0.95
Client launch: git clone https://github.com/kimbochen/bench_serving.git
python bench_serving/benchmark_serving.py \ --backend vllm \ --model /data/DeepSeek-R1-0528-MXFP4-Preview \ --dataset-name random \ --random-input-len 60000 \ --random-output-len 1 \ --random-prefix-len 57000 \ --num-prompts 40 \ --max-concurrency 8 \ --request-rate inf
Operating System
Ubuntu 24.04.3 LTS
CPU
AMD EPYC 9575F 64-Core Processor
GPU
MI355
ROCm Version
rocm7.2
ROCm Component
No response
Steps to Reproduce
Model: https://huggingface.co/amd/DeepSeek-R1-0528-MXFP4
Docker: rocm/atom-dev:nightly_202602250219
Server launch:
python -m atom.entrypoints.openai_server \ --model /data/DeepSeek-R1-0528-MXFP4-Preview \ -tp 8 \ --kv_cache_dtype fp8 \ --enable_prefix_caching \ --cudagraph-capture-sizes "[1,2,4,8,16,32]" \ --max-model-len 131072 \ --max-num-batched-tokens 131072 \ --max-num-seqs 32 \ --gpu-memory-utilization 0.95
Client launch: git clone https://github.com/kimbochen/bench_serving.git
python bench_serving/benchmark_serving.py \ --backend vllm \ --model /data/DeepSeek-R1-0528-MXFP4-Preview \ --dataset-name random \ --random-input-len 60000 \ --random-output-len 1 \ --random-prefix-len 57000 \ --num-prompts 40 \ --max-concurrency 8 \ --request-rate inf
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response