vllm-project · haic0 · Dec 11, 2025 · Dec 12, 2025 · Jan 28, 2026 · Jan 31, 2026
diff --git a/PaddlePaddle/PaddleOCR-VL.md b/PaddlePaddle/PaddleOCR-VL.md
@@ -101,3 +101,55 @@ for i, res in enumerate(output):
 - Unlike multi-turn chat use cases, we do not expect OCR tasks to benefit significantly from prefix caching or image reuse, therefore it's recommended to turn off these features to avoid unnecessary hashing and caching.
 - Depending on your hardware capability, adjust `max_num_batched_tokens` for better throughput performance.
 - Check out the official [PaddleOCR-VL documentation](https://github.com/PaddlePaddle/PaddleOCR) for more details and examples of using the model for various document parsing tasks.
+
+
+
+
+## AMD GPU Support
+Recommended approaches by hardware type are:
+
+
+MI300X/MI325X/MI355X 
+
+Please follow the steps here to install and run PaddleOCR-VL models on AMD MI300X/MI325X/MI355X GPU.
+
+### Step 1: Installing vLLM (AMD ROCm Backend: MI300X, MI325X, MI355X) 
+ > Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images).  
+ ```bash 
+ uv venv 
+ source .venv/bin/activate 
+ uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700
+ ```
+
+
+### Step 2: Start the vLLM server
+
+Run the vllm online serving
+Sample Command
+```shell
+
+SAFETENSORS_FAST_GPU=1 \
+VLLM_USE_V1=1 \
+VLLM_USE_TRITON_FLASH_ATTN=0 vllm serve PaddlePaddle/PaddleOCR-VL \
+  --max-num-batched-tokens 16384 \
+  --no-enable-prefix-caching \
+  --mm-processor-cache-gb 0 \
+  --trust-remote-code
+
+```
+
+
+### Step 3: Run Benchmark
+Open a new terminal and run the following command to execute the benchmark script inside the container.
+```shell
+  vllm bench serve \
+  --model "PaddlePaddle/PaddleOCR-VL" \
+  --dataset-name random \
+  --random-input-len 8192 \
+  --random-output-len 1024 \
+  --request-rate 10000 \
+  --num-prompts 16 \
+  --ignore-eos \
+  --trust-remote-code 
+```
+