Skip to content

[Issue]: ValueError: could not broadcast input array from shape (58713,) into shape (117001,) with prefix cache enabled #238

@peizhang56

Description

@peizhang56

Problem Description

I tried to run ATOM with amd/DeepSeek-R1-0528-MXFP4-Preview on MI355 with prefix cache enabled. I got an error
If I remove --enable_prefix_caching , then it is able to run.

Model: https://huggingface.co/amd/DeepSeek-R1-0528-MXFP4
Docker: rocm/atom-dev:nightly_202602250219
Server launch:
python -m atom.entrypoints.openai_server \ --model /data/DeepSeek-R1-0528-MXFP4-Preview \ -tp 8 \ --kv_cache_dtype fp8 \ --enable_prefix_caching \ --cudagraph-capture-sizes "[1,2,4,8,16,32]" \ --max-model-len 131072 \ --max-num-batched-tokens 131072 \ --max-num-seqs 32 \ --gpu-memory-utilization 0.95
Client launch: git clone https://github.com/kimbochen/bench_serving.git
python bench_serving/benchmark_serving.py \ --backend vllm \ --model /data/DeepSeek-R1-0528-MXFP4-Preview \ --dataset-name random \ --random-input-len 60000 \ --random-output-len 1 \ --random-prefix-len 57000 \ --num-prompts 40 \ --max-concurrency 8 \ --request-rate inf

Operating System

Ubuntu 24.04.3 LTS

CPU

AMD EPYC 9575F 64-Core Processor

GPU

MI355

ROCm Version

rocm7.2

ROCm Component

No response

Steps to Reproduce

Model: https://huggingface.co/amd/DeepSeek-R1-0528-MXFP4
Docker: rocm/atom-dev:nightly_202602250219
Server launch:
python -m atom.entrypoints.openai_server \ --model /data/DeepSeek-R1-0528-MXFP4-Preview \ -tp 8 \ --kv_cache_dtype fp8 \ --enable_prefix_caching \ --cudagraph-capture-sizes "[1,2,4,8,16,32]" \ --max-model-len 131072 \ --max-num-batched-tokens 131072 \ --max-num-seqs 32 \ --gpu-memory-utilization 0.95
Client launch: git clone https://github.com/kimbochen/bench_serving.git
python bench_serving/benchmark_serving.py \ --backend vllm \ --model /data/DeepSeek-R1-0528-MXFP4-Preview \ --dataset-name random \ --random-input-len 60000 \ --random-output-len 1 \ --random-prefix-len 57000 \ --num-prompts 40 \ --max-concurrency 8 \ --request-rate inf

server_prefixcache_error.log

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions