From 881d07d654f5700cb1dfe9bfc6f219802bceb8dd Mon Sep 17 00:00:00 2001 From: amd-asalykov Date: Wed, 28 Jan 2026 08:10:18 -0600 Subject: [PATCH 1/5] update qwen3guard for AMD GPU --- Qwen/Qwen3Guard-Gen.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/Qwen/Qwen3Guard-Gen.md b/Qwen/Qwen3Guard-Gen.md index 1abcff64..d44d2d1a 100644 --- a/Qwen/Qwen3Guard-Gen.md +++ b/Qwen/Qwen3Guard-Gen.md @@ -14,9 +14,24 @@ source .venv/bin/activate uv pip install -U vllm --torch-backend auto ``` +### Installing vLLM (AMD ROCm Backend: MI300X, MI325X, MI355X) +> Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images). +```bash +uv venv +source .venv/bin/activate +uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700 +``` + ### Running Qwen3Guard-Gen on a Single GPU ```bash -# Start server on a single GPU +vllm serve Qwen/Qwen3Guard-Gen-0.6B \ + --host 0.0.0.0 \ + --max-model-len 32768 +``` + +### Running Qwen3Guard-Gen with AMD ROCm Backend +```bash +export VLLM_ROCM_USE_AITER=1 vllm serve Qwen/Qwen3Guard-Gen-0.6B \ --host 0.0.0.0 \ --max-model-len 32768 From c94b0410a8c2ce2678f99dfb9516db8b0c7d2264 Mon Sep 17 00:00:00 2001 From: amd-asalykov Date: Wed, 28 Jan 2026 09:24:59 -0600 Subject: [PATCH 2/5] update --- Qwen/Qwen3Guard-Gen.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Qwen/Qwen3Guard-Gen.md b/Qwen/Qwen3Guard-Gen.md index d44d2d1a..0e0a3860 100644 --- a/Qwen/Qwen3Guard-Gen.md +++ b/Qwen/Qwen3Guard-Gen.md @@ -19,7 +19,7 @@ uv pip install -U vllm --torch-backend auto ```bash uv venv source .venv/bin/activate -uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700 +uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm ``` ### Running Qwen3Guard-Gen on a Single GPU From 1d6477edf8cdc43670822fde320a518a300d579a Mon Sep 17 00:00:00 2001 From: amd-asalykov Date: Fri, 30 Jan 2026 14:28:39 +0000 Subject: [PATCH 3/5] update --- Qwen/Qwen3Guard-Gen.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Qwen/Qwen3Guard-Gen.md b/Qwen/Qwen3Guard-Gen.md index 0e0a3860..d44d2d1a 100644 --- a/Qwen/Qwen3Guard-Gen.md +++ b/Qwen/Qwen3Guard-Gen.md @@ -19,7 +19,7 @@ uv pip install -U vllm --torch-backend auto ```bash uv venv source .venv/bin/activate -uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm +uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700 ``` ### Running Qwen3Guard-Gen on a Single GPU From 12458e7032ac6f2f8e879b260d150799460a7528 Mon Sep 17 00:00:00 2001 From: amd-asalykov Date: Thu, 5 Feb 2026 22:25:41 +0000 Subject: [PATCH 4/5] use vllm latest stable --- Qwen/Qwen3Guard-Gen.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Qwen/Qwen3Guard-Gen.md b/Qwen/Qwen3Guard-Gen.md index d44d2d1a..b1e074fa 100644 --- a/Qwen/Qwen3Guard-Gen.md +++ b/Qwen/Qwen3Guard-Gen.md @@ -19,7 +19,7 @@ uv pip install -U vllm --torch-backend auto ```bash uv venv source .venv/bin/activate -uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700 +uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/ ``` ### Running Qwen3Guard-Gen on a Single GPU From dd3b67166ad494a25be716f6ecded800657ad457 Mon Sep 17 00:00:00 2001 From: amd-asalykov Date: Mon, 16 Feb 2026 04:01:01 -0600 Subject: [PATCH 5/5] update --- Qwen/Qwen3Guard-Gen.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/Qwen/Qwen3Guard-Gen.md b/Qwen/Qwen3Guard-Gen.md index b1e074fa..b9cac80b 100644 --- a/Qwen/Qwen3Guard-Gen.md +++ b/Qwen/Qwen3Guard-Gen.md @@ -4,9 +4,9 @@ This guide describes how to run **Qwen3Guard-Gen** on GPU using vLLM. Qwen3Guard-Gen is a lightweight text-only guardrail model. -## GPU Deployment +## Installing vLLM -### Installing vLLM +### CUDA ```bash uv venv @@ -14,7 +14,7 @@ source .venv/bin/activate uv pip install -U vllm --torch-backend auto ``` -### Installing vLLM (AMD ROCm Backend: MI300X, MI325X, MI355X) +### ROCm > Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images). ```bash uv venv @@ -22,14 +22,16 @@ source .venv/bin/activate uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/ ``` -### Running Qwen3Guard-Gen on a Single GPU +## Running Qwen3Guard-Gen on a Single GPU + +### CUDA ```bash vllm serve Qwen/Qwen3Guard-Gen-0.6B \ --host 0.0.0.0 \ --max-model-len 32768 ``` -### Running Qwen3Guard-Gen with AMD ROCm Backend +### ROCm ```bash export VLLM_ROCM_USE_AITER=1 vllm serve Qwen/Qwen3Guard-Gen-0.6B \ @@ -37,9 +39,7 @@ vllm serve Qwen/Qwen3Guard-Gen-0.6B \ --max-model-len 32768 ``` -## Performance Metrics - -### Benchmarking +## Benchmarking ```bash vllm bench serve \ --model Qwen/Qwen3Guard-Gen-0.6B \