Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 21 additions & 6 deletions Qwen/Qwen3Guard-Gen.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,42 @@ This guide describes how to run **Qwen3Guard-Gen** on GPU using vLLM.

Qwen3Guard-Gen is a lightweight text-only guardrail model.

## GPU Deployment
## Installing vLLM

### Installing vLLM
### CUDA

```bash
uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend auto
```

### Running Qwen3Guard-Gen on a Single GPU
### ROCm
> Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images).
```bash
uv venv
source .venv/bin/activate
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/
```

## Running Qwen3Guard-Gen on a Single GPU

### CUDA
```bash
# Start server on a single GPU
vllm serve Qwen/Qwen3Guard-Gen-0.6B \
--host 0.0.0.0 \
--max-model-len 32768
```

## Performance Metrics
### ROCm
```bash
export VLLM_ROCM_USE_AITER=1
vllm serve Qwen/Qwen3Guard-Gen-0.6B \
--host 0.0.0.0 \
--max-model-len 32768
```

### Benchmarking
## Benchmarking
```bash
vllm bench serve \
--model Qwen/Qwen3Guard-Gen-0.6B \
Expand Down