From 98c37022797ee910ddae2dec821c43436bf9f476 Mon Sep 17 00:00:00 2001 From: haic0 <149741444+haic0@users.noreply.github.com> Date: Thu, 11 Dec 2025 10:33:09 +0800 Subject: [PATCH 1/8] Update Seed-OSS-36B.md Signed-off-by: haic0 <149741444+haic0@users.noreply.github.com> --- Seed/Seed-OSS-36B.md | 55 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 54 insertions(+), 1 deletion(-) diff --git a/Seed/Seed-OSS-36B.md b/Seed/Seed-OSS-36B.md index 5a165d7a..4ad948eb 100644 --- a/Seed/Seed-OSS-36B.md +++ b/Seed/Seed-OSS-36B.md @@ -196,4 +196,57 @@ Mean ITL (ms): 44.39 Median ITL (ms): 46.18 P99 ITL (ms): 64.52 ================================================== -``` \ No newline at end of file +``` + + +## AMD GPU Support +Please follow the steps here to install and run Seed-OSS-36B-Instruct models on AMD MI300X GPU. +### Step 1: Prepare Docker Environment +Pull the latest vllm docker: +```shell +docker pull rocm/vllm-dev:nightly +``` +Launch the ROCm vLLM docker: +```shell +docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Seed-OSS-36B-Instruct rocm/vllm-dev:nightly +``` +### Step 2: Log in to Hugging Face +Huggingface login +```shell +huggingface-cli login +``` + +### Step 3: Start the vLLM server + +Run the vllm online serving +Sample Command +```shell + +SAFETENSORS_FAST_GPU=1 \ +VLLM_USE_V1=1 \ +VLLM_USE_TRITON_FLASH_ATTN=0 vllm serve ByteDance-Seed/Seed-OSS-36B-Instruct \ + --tensor-parallel-size 8 \ + --enable-auto-tool-choice \ + --tool-call-parser seed_oss \ + --no-enable-prefix-caching \ + --trust-remote-code + +``` + + +### Step 4: Run Benchmark +Open a new terminal and run the following command to execute the benchmark script inside the container. +```shell +docker exec -it Seed-OSS-36B-Instruct vllm bench serve \ + --model "ByteDance-Seed/Seed-OSS-36B-Instruct" \ + --dataset-name random \ + --random-input-len 8192 \ + --random-output-len 1024 \ + --request-rate 10000 \ + --num-prompts 16 \ + --ignore-eos \ + --trust-remote-code +``` + + + From 72a990b503ef1b5b8770c3842656870987fd3880 Mon Sep 17 00:00:00 2001 From: haic0 <149741444+haic0@users.noreply.github.com> Date: Fri, 12 Dec 2025 12:25:12 +0800 Subject: [PATCH 2/8] Update Seed/Seed-OSS-36B.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: haic0 <149741444+haic0@users.noreply.github.com> --- Seed/Seed-OSS-36B.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Seed/Seed-OSS-36B.md b/Seed/Seed-OSS-36B.md index 4ad948eb..9e543f92 100644 --- a/Seed/Seed-OSS-36B.md +++ b/Seed/Seed-OSS-36B.md @@ -208,7 +208,7 @@ docker pull rocm/vllm-dev:nightly ``` Launch the ROCm vLLM docker: ```shell -docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Seed-OSS-36B-Instruct rocm/vllm-dev:nightly +docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Seed-OSS-36B-Instruct rocm/vllm-dev:nightly ``` ### Step 2: Log in to Hugging Face Huggingface login From 0e2d07a63f5414a95b1d073473e306ed7b989d7f Mon Sep 17 00:00:00 2001 From: haic0 <149741444+haic0@users.noreply.github.com> Date: Fri, 12 Dec 2025 12:25:25 +0800 Subject: [PATCH 3/8] Update Seed/Seed-OSS-36B.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: haic0 <149741444+haic0@users.noreply.github.com> --- Seed/Seed-OSS-36B.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/Seed/Seed-OSS-36B.md b/Seed/Seed-OSS-36B.md index 9e543f92..0e37a1b6 100644 --- a/Seed/Seed-OSS-36B.md +++ b/Seed/Seed-OSS-36B.md @@ -221,7 +221,6 @@ huggingface-cli login Run the vllm online serving Sample Command ```shell - SAFETENSORS_FAST_GPU=1 \ VLLM_USE_V1=1 \ VLLM_USE_TRITON_FLASH_ATTN=0 vllm serve ByteDance-Seed/Seed-OSS-36B-Instruct \ @@ -230,7 +229,6 @@ VLLM_USE_TRITON_FLASH_ATTN=0 vllm serve ByteDance-Seed/Seed-OSS-36B-Instruct \ --tool-call-parser seed_oss \ --no-enable-prefix-caching \ --trust-remote-code - ``` From 9c6225f99dbd062aa2a2bf66dae622e34f20235d Mon Sep 17 00:00:00 2001 From: amd-asalykov Date: Wed, 28 Jan 2026 08:36:37 -0600 Subject: [PATCH 4/8] update --- Seed/Seed-OSS-36B.md | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/Seed/Seed-OSS-36B.md b/Seed/Seed-OSS-36B.md index 0e37a1b6..82238e98 100644 --- a/Seed/Seed-OSS-36B.md +++ b/Seed/Seed-OSS-36B.md @@ -200,16 +200,15 @@ P99 ITL (ms): 64.52 ## AMD GPU Support -Please follow the steps here to install and run Seed-OSS-36B-Instruct models on AMD MI300X GPU. -### Step 1: Prepare Docker Environment -Pull the latest vllm docker: -```shell -docker pull rocm/vllm-dev:nightly -``` -Launch the ROCm vLLM docker: -```shell -docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Seed-OSS-36B-Instruct rocm/vllm-dev:nightly +Please follow the steps here to install and run Seed-OSS-36B-Instruct models on AMD MI300X/MI325X/MI355X +### Step 1: Install vLLM +> Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images). +```bash +uv venv +source .venv/bin/activate +uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700 ``` + ### Step 2: Log in to Hugging Face Huggingface login ```shell @@ -218,12 +217,13 @@ huggingface-cli login ### Step 3: Start the vLLM server -Run the vllm online serving -Sample Command +Run the vllm online serving: ```shell -SAFETENSORS_FAST_GPU=1 \ -VLLM_USE_V1=1 \ -VLLM_USE_TRITON_FLASH_ATTN=0 vllm serve ByteDance-Seed/Seed-OSS-36B-Instruct \ +export SAFETENSORS_FAST_GPU=1 +export VLLM_USE_V1=1 +export VLLM_USE_TRITON_FLASH_ATTN=0 +export VLLM_ROCM_USE_AITER=1 +vllm serve ByteDance-Seed/Seed-OSS-36B-Instruct \ --tensor-parallel-size 8 \ --enable-auto-tool-choice \ --tool-call-parser seed_oss \ @@ -233,9 +233,9 @@ VLLM_USE_TRITON_FLASH_ATTN=0 vllm serve ByteDance-Seed/Seed-OSS-36B-Instruct \ ### Step 4: Run Benchmark -Open a new terminal and run the following command to execute the benchmark script inside the container. +Open a new terminal and run the following command to execute the benchmark script: ```shell -docker exec -it Seed-OSS-36B-Instruct vllm bench serve \ +vllm bench serve \ --model "ByteDance-Seed/Seed-OSS-36B-Instruct" \ --dataset-name random \ --random-input-len 8192 \ From a205e6f9290f39b2e08826e7be8236e25f05ed5e Mon Sep 17 00:00:00 2001 From: amd-asalykov Date: Wed, 28 Jan 2026 09:18:24 -0600 Subject: [PATCH 5/8] update --- Seed/Seed-OSS-36B.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Seed/Seed-OSS-36B.md b/Seed/Seed-OSS-36B.md index 82238e98..0b0554c2 100644 --- a/Seed/Seed-OSS-36B.md +++ b/Seed/Seed-OSS-36B.md @@ -206,7 +206,7 @@ Please follow the steps here to install and run Seed-OSS-36B-Instruct models on ```bash uv venv source .venv/bin/activate -uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700 +uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm ``` ### Step 2: Log in to Hugging Face From 2b000d8fc69acde3af86f5d69cb0162ac78b11fa Mon Sep 17 00:00:00 2001 From: amd-asalykov Date: Fri, 30 Jan 2026 14:33:34 +0000 Subject: [PATCH 6/8] update --- Seed/Seed-OSS-36B.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Seed/Seed-OSS-36B.md b/Seed/Seed-OSS-36B.md index 0b0554c2..82238e98 100644 --- a/Seed/Seed-OSS-36B.md +++ b/Seed/Seed-OSS-36B.md @@ -206,7 +206,7 @@ Please follow the steps here to install and run Seed-OSS-36B-Instruct models on ```bash uv venv source .venv/bin/activate -uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm +uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700 ``` ### Step 2: Log in to Hugging Face From 7c9fc022cef98a9ff0554f2ebe1c40c14bf7baab Mon Sep 17 00:00:00 2001 From: amd-asalykov Date: Thu, 5 Feb 2026 22:27:20 +0000 Subject: [PATCH 7/8] use vllm latest stable release --- Seed/Seed-OSS-36B.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Seed/Seed-OSS-36B.md b/Seed/Seed-OSS-36B.md index 82238e98..6c6307f0 100644 --- a/Seed/Seed-OSS-36B.md +++ b/Seed/Seed-OSS-36B.md @@ -206,7 +206,7 @@ Please follow the steps here to install and run Seed-OSS-36B-Instruct models on ```bash uv venv source .venv/bin/activate -uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700 +uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/ ``` ### Step 2: Log in to Hugging Face From 043955186e048f65fb6aec1cdfbfff7b35714c09 Mon Sep 17 00:00:00 2001 From: amd-asalykov Date: Mon, 16 Feb 2026 06:50:39 -0600 Subject: [PATCH 8/8] update --- Seed/Seed-OSS-36B.md | 78 +++++++++++++++----------------------------- 1 file changed, 26 insertions(+), 52 deletions(-) diff --git a/Seed/Seed-OSS-36B.md b/Seed/Seed-OSS-36B.md index 6c6307f0..3d46331d 100644 --- a/Seed/Seed-OSS-36B.md +++ b/Seed/Seed-OSS-36B.md @@ -4,6 +4,7 @@ This guide describes how to run Seed-OSS-36B models with vLLM and native BF16 pr ## Installing vLLM +### CUDA Seed-OSS support was recently added to vLLM main branch and is not yet available in any official release: ```bash @@ -20,8 +21,17 @@ You may need to download the latest version of the transformer for compatibility uv pip install git+https://github.com/huggingface/transformers.git@56d68c6706ee052b445e1e476056ed92ac5eb383 ``` +### ROCm +> Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images). Tested hardware: MI300X, MI325X, MI355X +```bash +uv venv +source .venv/bin/activate +uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/ +``` + ## Running Seed-OSS-36B with BF16 +### CUDA There are two ways to parallelize the model over multiple GPUs: (1) Tensor-parallel or (2) Data-parallel. Each one has its own advantages, where tensor-parallel is usually more beneficial for low-latency / low-load scenarios and data-parallel works better for cases where there is a lot of data with heavy-loads. Run tensor-parallel like this: @@ -40,6 +50,21 @@ vllm serve ByteDance-Seed/Seed-OSS-36B-Instruct \ * vLLM conservatively use 90% of GPU memory, you can set `--gpu-memory-utilization=0.95` to maximize KVCache. * Make sure to follow the command-line instructions to ensure the tool-calling functionality is properly enabled. +### ROCm + +```shell +export SAFETENSORS_FAST_GPU=1 +export VLLM_USE_V1=1 +export VLLM_USE_TRITON_FLASH_ATTN=0 +export VLLM_ROCM_USE_AITER=1 +vllm serve ByteDance-Seed/Seed-OSS-36B-Instruct \ + --tensor-parallel-size 8 \ + --enable-auto-tool-choice \ + --tool-call-parser seed_oss \ + --no-enable-prefix-caching \ + --trust-remote-code +``` + ## Thinking Budget Feature Users can flexibly specify the model's thinking budget. For simpler tasks (such as IFEval), the model's chain of thought (CoT) is shorter, and the score exhibits fluctuations as the thinking budget increases. For more challenging tasks (such as AIME and LiveCodeBench), the model's CoT is longer, and the score improves with an increase in the thinking budget. @@ -196,55 +221,4 @@ Mean ITL (ms): 44.39 Median ITL (ms): 46.18 P99 ITL (ms): 64.52 ================================================== -``` - - -## AMD GPU Support -Please follow the steps here to install and run Seed-OSS-36B-Instruct models on AMD MI300X/MI325X/MI355X -### Step 1: Install vLLM -> Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images). -```bash -uv venv -source .venv/bin/activate -uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/ -``` - -### Step 2: Log in to Hugging Face -Huggingface login -```shell -huggingface-cli login -``` - -### Step 3: Start the vLLM server - -Run the vllm online serving: -```shell -export SAFETENSORS_FAST_GPU=1 -export VLLM_USE_V1=1 -export VLLM_USE_TRITON_FLASH_ATTN=0 -export VLLM_ROCM_USE_AITER=1 -vllm serve ByteDance-Seed/Seed-OSS-36B-Instruct \ - --tensor-parallel-size 8 \ - --enable-auto-tool-choice \ - --tool-call-parser seed_oss \ - --no-enable-prefix-caching \ - --trust-remote-code -``` - - -### Step 4: Run Benchmark -Open a new terminal and run the following command to execute the benchmark script: -```shell -vllm bench serve \ - --model "ByteDance-Seed/Seed-OSS-36B-Instruct" \ - --dataset-name random \ - --random-input-len 8192 \ - --random-output-len 1024 \ - --request-rate 10000 \ - --num-prompts 16 \ - --ignore-eos \ - --trust-remote-code -``` - - - +``` \ No newline at end of file