Skip to content

[Bugfix] Add a aiter enabled check before appending a unimported class#862

Closed
JadenMathias wants to merge 1 commit intoreleases/rocm/v0.16.0from
jmathias/fix
Closed

[Bugfix] Add a aiter enabled check before appending a unimported class#862
JadenMathias wants to merge 1 commit intoreleases/rocm/v0.16.0from
jmathias/fix

Conversation

@JadenMathias
Copy link

@JadenMathias JadenMathias commented Feb 19, 2026

Purpose

A quick fix addressing a NameError that is caused when running serve command with aiter disabled:

MODEL=amd/Llama-3.3-70B-Instruct-FP8-KV
MAX_MODEL_LEN=10240
TP=1
CONC=32
INPUT_LEN=1024
OUTPUT_LEN=1024
ATTN_BACKEND="--attention-backend ROCM_ATTN"
FUSE_ROPE_KVCACHE="-cc.pass_config.fuse_rope_kvcache=True -cc.use_inductor_graph_partition=True"
FUSE_OTHER="-cc.pass_config.fuse_norm_quant=True -cc.pass_config.fuse_act_quant=True -cc.pass_config.fuse_attn_quant=True"
# export VLLM_ROCM_USE_AITER_TRITON_FUSED_ROPE_ZEROS_KV_CACHE=1

export AMDGCN_USE_BUFFER_OPS=1
export VLLM_ROCM_USE_AITER=0 VLLM_ROCM_USE_AITER_MHA=0
export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT4
export VLLM_DISABLE_COMPILE_CACHE=1
# export VLLM_COMPILE_CACHE_SAVE_FORMAT=unpacked

rm -rf ~/.cache/vllm /tmp/torchinductor*
vllm serve $MODEL \
    --tensor-parallel-size=$TP --gpu-memory-utilization=0.94 \
    --dtype=auto --kv-cache-dtype=fp8 \
    --max-num-batched-tokens=131072 \
    $ATTN_BACKEND \
    $FUSE_ROPE_KVCACHE \
    $FUSE_OTHER \
    --compilation-config '{"use_inductor_graph_partition": true}' \
    --no-enable-prefix-caching \
    --disable-log-requests \
    --disable-uvicorn-access-log \

The following error is observed:

image

This fix simply adds the check for if aiter is enabled

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant