vllm-project · hyukjlee · Jan 26, 2026 · Feb 9, 2026 · Feb 9, 2026
diff --git a/Mistral/Mistral-3-Instruct-AMD.md b/Mistral/Mistral-3-Instruct-AMD.md
@@ -0,0 +1,87 @@
+# Ministral 3 14B Instruct on vLLM - AMD Hardware
+
+## Introduction
+
+This quick start recipe explains how to run the Ministral 3 14B Instruct model on AMD MI300X/MI355X GPUs using vLLM.
+
+## Key benefits of AMD GPUs on large models and developers
+
+The AMD Instinct GPUs accelerators are purpose-built to handle the demands of next-gen models like Ministral:
+- Large HBM memory enables longer contexts and higher concurrency.
+- Optimized Triton and AITER kernels provide best-in-class performance and TCO for production deployment.
+- Strong single-node performance reduces infrastructure complexity for serving.
+
+## Access & Licensing
+
+### License and Model parameters
+
+Please check whether you have access to the following model:
+- [Ministral 3 14B Instruct](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512)
+
+## Prerequisites
+
+- OS: Linux
+- Drivers: ROCm 7.0 or above
+- GPU: AMD MI300X, MI325X, and MI355X
+
+## Deployment Steps
+
+### 1. Using vLLM docker image (For AMD users)
+
+```bash
+docker run -it \
+  --network=host \
+  --device=/dev/kfd \
+  --device=/dev/dri \
+  --group-add=video \
+  --ipc=host \
+  --cap-add=SYS_PTRACE \
+  --security-opt seccomp=unconfined \
+  --shm-size 32G \
+  -v /data:/data \
+  -v $HOME:/myhome \
+  -w /myhome \
+  --entrypoint /bin/bash \
+  vllm/vllm-openai-rocm:latest
+```
+or you can use uv environment.
+ > Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images).  
+ ```bash 
+ uv venv 
+ source .venv/bin/activate 
+ uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/
+ ```
+### 2. Start vLLM online server (run in background)
+
+```bash
+export TP=1
+export VLLM_ROCM_USE_AITER=1
+export MODEL="mistralai/Ministral-3-14B-Instruct-2512"
+vllm serve $MODEL \
+  --disable-log-requests \  
+  -tp $TP \
+  --config_format mistral \
+  --load_format mistral \
+  --enable-auto-tool-choice \
+  --tool-call-parser mistral &
+```
+
+### 3. Performance benchmark
+
+```bash
+export MODEL="mistralai/Ministral-3-14B-Instruct-2512"
+export ISL=1024
+export OSL=1024
+export REQ=10
+export CONC=10
+vllm bench serve \
+  --backend vllm \
+  --model $MODEL \
+  --dataset-name random \
+  --random-input-len $ISL \
+  --random-output-len $OSL \
+  --num-prompts $REQ \
+  --ignore-eos \
+  --max-concurrency $CONC \  
+  --percentile-metrics ttft,tpot,itl,e2el
+```