-
Notifications
You must be signed in to change notification settings - Fork 165
Add the support of AMD MI300X/MI325X/MI355X of Ernie 4.5 recipe #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
2ef012d
9013b33
83b6ea6
97d3b50
8a8e9a6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,15 +4,21 @@ This guide describes how to run [ERNIE-4.5-21B-A3B-PT](https://huggingface.co/ba | |
|
|
||
| ## Installing vLLM | ||
| Note: transformers >= 4.54.0 and vllm >= 0.10.1 | ||
|
|
||
| ### CUDA | ||
| ```bash | ||
| uv venv --python 3.12 --seed | ||
| source .venv/bin/activate | ||
| uv pip install vllm --torch-backend=auto | ||
| ``` | ||
|
|
||
| ## Running Ernie4.5 | ||
| ### AMD ROCm: MI300x/MI325x/MI355x | ||
| ```bash | ||
| uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.16.0/rocm700 | ||
| ``` | ||
| ⚠️ The vLLM wheel for ROCm is compatible with Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment is incompatible, please use docker flow in [vLLM](https://vllm.ai/) | ||
|
|
||
| ## Running Ernie4.5 | ||
| ### Serving Ernie4.5 Model on H100 GPUs | ||
| ```bash | ||
| # 21B model 80G*1 GPU | ||
| vllm serve baidu/ERNIE-4.5-21B-A3B-PT | ||
|
|
@@ -33,8 +39,19 @@ vllm serve baidu/ERNIE-4.5-300B-A47B-PT \ | |
| --tensor-parallel-size 16 | ||
| ``` | ||
|
|
||
| ## Running Ernie4.5 MTP | ||
| ### Serving Ernie4.5 Model on MI300X/MI325X/MI355X GPUs | ||
| Run the vLLM online serving on AMD GPUs using the command below: | ||
| ```bash | ||
| VLLM_ROCM_USE_AITER=1 \ | ||
| SAFETENSORS_FAST_GPU=1 \ | ||
| vllm serve baidu/ERNIE-4.5-21B-A3B-PT \ | ||
| --tensor-parallel-size 4 \ | ||
| --gpu-memory-utilization 0.9 \ | ||
| --disable-log-requests \ | ||
| --trust-remote-code | ||
| ``` | ||
|
|
||
| ## Running Ernie4.5 MTP | ||
| ```bash | ||
| # 21B MTP model 80G*1 GPU | ||
| vllm serve baidu/ERNIE-4.5-21B-A3B-PT \ | ||
|
|
@@ -58,12 +75,8 @@ vllm serve baidu/ERNIE-4.5-300B-A47B-PT \ | |
| --speculative-config '{"method": "ernie_mtp","model": "baidu/ERNIE-4.5-300B-A47B-PT","num_speculative_tokens": 1}' | ||
| ``` | ||
|
|
||
|
|
||
| ## Benchmarking | ||
|
|
||
| For benchmarking, only the first `vllm bench serve` after service startup to ensure it is not affected by prefix cache | ||
|
|
||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you remove this unnecessary line change? |
||
| ```bash | ||
| # Prompt-heavy benchmark (8k/1k) | ||
| vllm bench serve \ | ||
|
|
@@ -79,18 +92,14 @@ vllm bench serve \ | |
| ``` | ||
|
|
||
| ### Benchmark Configurations | ||
|
|
||
| Test different workloads by adjusting input/output lengths: | ||
|
|
||
| - **Prompt-heavy**: 8000 input / 1000 output | ||
| - **Decode-heavy**: 1000 input / 8000 output | ||
| - **Balanced**: 1000 input / 1000 output | ||
|
|
||
| Test different batch sizes by changing `--num-prompts`, e.g., 1, 16, 32, 64, 128, 256, 512 | ||
|
|
||
| ### Expected Output | ||
|
|
||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you remove this unnecessary line change? |
||
| ```shell | ||
| ============ Serving Benchmark Result ============ | ||
| Successful requests: 16 | ||
|
|
@@ -114,5 +123,4 @@ Mean ITL (ms): 16.84 | |
| Median ITL (ms): 15.49 | ||
| P99 ITL (ms): 20.69 | ||
| ================================================== | ||
| ``` | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you remove this unnecessary line change? |
||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove this unnecessary line change?