Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions PaddlePaddle/PaddleOCR-VL.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,55 @@ for i, res in enumerate(output):
- Unlike multi-turn chat use cases, we do not expect OCR tasks to benefit significantly from prefix caching or image reuse, therefore it's recommended to turn off these features to avoid unnecessary hashing and caching.
- Depending on your hardware capability, adjust `max_num_batched_tokens` for better throughput performance.
- Check out the official [PaddleOCR-VL documentation](https://github.com/PaddlePaddle/PaddleOCR) for more details and examples of using the model for various document parsing tasks.


Comment on lines +104 to +105
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These two blank lines are unnecessary. Removing them will improve formatting consistency, leaving two blank lines before the next section, which is sufficient.



## AMD GPU Support
Recommended approaches by hardware type are:


MI300X/MI325X/MI355X

Please follow the steps here to install and run PaddleOCR-VL models on AMD MI300X/MI325X/MI355X GPU.

### Step 1: Installing vLLM (AMD ROCm Backend: MI300X, MI325X, MI355X)
> Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images).
```bash
uv venv
source .venv/bin/activate
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700
```


### Step 2: Start the vLLM server

Run the vllm online serving
Sample Command
```shell

SAFETENSORS_FAST_GPU=1 \
VLLM_USE_V1=1 \
VLLM_USE_TRITON_FLASH_ATTN=0 vllm serve PaddlePaddle/PaddleOCR-VL \
--max-num-batched-tokens 16384 \
--no-enable-prefix-caching \
--mm-processor-cache-gb 0 \
--trust-remote-code

```


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This extra blank line can be removed to improve formatting consistency. One blank line is sufficient to separate the code block from the heading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you address gemini's format comments?

### Step 3: Run Benchmark
Open a new terminal and run the following command to execute the benchmark script inside the container.
```shell
vllm bench serve \
--model "PaddlePaddle/PaddleOCR-VL" \
--dataset-name random \
--random-input-len 8192 \
--random-output-len 1024 \
--request-rate 10000 \
--num-prompts 16 \
--ignore-eos \
--trust-remote-code
```