vllm-project · tjtanaa · Feb 16, 2026 · Jan 28, 2026 · Jan 28, 2026 · Jan 30, 2026
diff --git a/Qwen/Qwen3Guard-Gen.md b/Qwen/Qwen3Guard-Gen.md
@@ -4,27 +4,42 @@ This guide describes how to run **Qwen3Guard-Gen** on GPU using vLLM.
 
 Qwen3Guard-Gen is a lightweight text-only guardrail model.
 
-## GPU Deployment
+## Installing vLLM
 
-### Installing vLLM
+### CUDA
 
 ```bash
 uv venv
 source .venv/bin/activate
 uv pip install -U vllm --torch-backend auto
 ```
 
-### Running Qwen3Guard-Gen on a Single GPU
+### ROCm
+> Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images). 
+```bash
+uv venv
+source .venv/bin/activate
+uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/
+```
+
+## Running Qwen3Guard-Gen on a Single GPU
+
+### CUDA
 ```bash
-# Start server on a single GPU
 vllm serve Qwen/Qwen3Guard-Gen-0.6B \
   --host 0.0.0.0 \
   --max-model-len 32768
 ```
 
-## Performance Metrics
+### ROCm
+```bash
+export VLLM_ROCM_USE_AITER=1
+vllm serve Qwen/Qwen3Guard-Gen-0.6B \
+  --host 0.0.0.0 \
+  --max-model-len 32768
+```
 
-### Benchmarking
+## Benchmarking
 ```bash
 vllm bench serve \
   --model Qwen/Qwen3Guard-Gen-0.6B \