[Issue]: ValueError: could not broadcast input array from shape (58713,) into shape (117001,) with prefix cache enabled

### Problem Description

I tried to run ATOM with amd/DeepSeek-R1-0528-MXFP4-Preview on MI355 with prefix cache enabled. I got an error
If I remove **--enable_prefix_caching** , then it is able to run.

Model: https://huggingface.co/amd/DeepSeek-R1-0528-MXFP4
Docker: rocm/atom-dev:nightly_202602250219
Server launch: 
`
python -m atom.entrypoints.openai_server \
  --model /data/DeepSeek-R1-0528-MXFP4-Preview \
  -tp 8 \
  --kv_cache_dtype fp8 \
  --enable_prefix_caching \
  --cudagraph-capture-sizes "[1,2,4,8,16,32]" \
  --max-model-len 131072 \
  --max-num-batched-tokens 131072 \
  --max-num-seqs 32 \
  --gpu-memory-utilization 0.95
`
Client launch: git clone https://github.com/kimbochen/bench_serving.git
`
python bench_serving/benchmark_serving.py \
  --backend vllm \
  --model /data/DeepSeek-R1-0528-MXFP4-Preview \
  --dataset-name random \
  --random-input-len 60000 \
  --random-output-len 1 \
  --random-prefix-len 57000 \
  --num-prompts 40 \
  --max-concurrency 8 \
  --request-rate inf
`


### Operating System

Ubuntu 24.04.3 LTS

### CPU

AMD EPYC 9575F 64-Core Processor

### GPU

MI355

### ROCm Version

rocm7.2

### ROCm Component

_No response_

### Steps to Reproduce

Model: https://huggingface.co/amd/DeepSeek-R1-0528-MXFP4
Docker: rocm/atom-dev:nightly_202602250219
Server launch: 
`
python -m atom.entrypoints.openai_server \
  --model /data/DeepSeek-R1-0528-MXFP4-Preview \
  -tp 8 \
  --kv_cache_dtype fp8 \
  --enable_prefix_caching \
  --cudagraph-capture-sizes "[1,2,4,8,16,32]" \
  --max-model-len 131072 \
  --max-num-batched-tokens 131072 \
  --max-num-seqs 32 \
  --gpu-memory-utilization 0.95
`
Client launch: git clone https://github.com/kimbochen/bench_serving.git
`
python bench_serving/benchmark_serving.py \
  --backend vllm \
  --model /data/DeepSeek-R1-0528-MXFP4-Preview \
  --dataset-name random \
  --random-input-len 60000 \
  --random-output-len 1 \
  --random-prefix-len 57000 \
  --num-prompts 40 \
  --max-concurrency 8 \
  --request-rate inf
`

[server_prefixcache_error.log](https://github.com/user-attachments/files/25558053/server_prefixcache_error.log)

### (Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

_No response_

### Additional Information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: ValueError: could not broadcast input array from shape (58713,) into shape (117001,) with prefix cache enabled #238

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Issue]: ValueError: could not broadcast input array from shape (58713,) into shape (117001,) with prefix cache enabled #238

Description

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions