[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0 by cquil11 · Pull Request #904 · SemiAnalysisAI/InferenceX

cquil11 · 2026-03-10T16:52:03Z

Bump vllm/vllm-openai image tag for all 3 NVIDIA GPT-OSS configs (B200, H100, H200). All existing BKC flags preserved — no config changes beyond the image tag.

v0.16.0 notable changes for GPT-OSS/MXFP4:

Async scheduling + pipeline parallelism (30.8% throughput improvement)
New MXFP4 backends: SM90 FlashInfer BF16, SM100 CUTLASS
MoE cold start optimization
Triton backend now default non-FlashInfer fallback on SM90/SM100

Closes #798

…1 to v0.16.0 (#800)" (#903) [skip-sweep]" This reverts commit cad1169.

github-actions · 2026-03-10T16:52:12Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

cquil11 · 2026-03-10T16:52:27Z

@ankursingh-nv re-trigger

The H200 cluster nodes register GPUs as gpu:nvidia_h200, not generic gpu, so salloc requires the full type in --gres. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 1e991b0.

claude

LGTM — straightforward image tag bump from v0.15.1 to v0.16.0-cu130 across three NVIDIA GPT-OSS vLLM configs, with a minor nit on the changelog pr-link.

Extended reasoning...

Overview

This PR modifies two files: .github/configs/nvidia-master.yaml (updating the vllm/vllm-openai image tag from v0.15.1 to v0.16.0-cu130 in three places — B200, H100, and H200 GPT-OSS configs) and perf-changelog.yaml (adding a corresponding changelog entry). No config flags, parameters, or logic were changed beyond the image tag.

Security risks

None. This is a Docker image tag update in benchmark configuration files. No authentication, authorization, or data-handling code is involved.

Level of scrutiny

Low scrutiny is appropriate. This is a mechanical, pattern-following config change — identical in structure to dozens of prior changelog entries and image bumps visible in the same files. The PR re-lands a previously merged-then-reverted change (PR #800 → reverted by #903 → re-landed here as #904) with a slightly different image tag (v0.16.0-cu130 instead of v0.16.0).

Other factors

The one nit found (pr-link referencing #800 instead of #904) is valid but cosmetic — it affects changelog traceability, not functionality. The inline comment already flags this. The PR description is clear, the scope is minimal, and the change follows the established pattern throughout the repository.

claude · 2026-03-10T17:12:30Z

perf-changelog.yaml

+    - gptoss-fp4-h200-vllm
+  description:
+    - "Update vLLM image from v0.15.1 to v0.16.0-cu130 for NVIDIA GPT-OSS configs"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/800


🟡 The pr-link for the new changelog entry points to PR #800, which was already merged and then reverted (PR #903). Since this PR #904 is the one actually re-landing the change, the link should be #904 for proper traceability.

Extended reasoning...

What the bug is

The new perf-changelog.yaml entry added at the bottom of the file has its pr-link set to #800. However, PR #800 was previously merged (commit da55158), then reverted by PR #903 (commit cad1169). This PR #904 is the one that actually re-lands the vLLM image update (now with the -cu130 suffix), so the changelog entry should reference #904.

Step-by-step proof

PR [NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0 #800 was merged as commit da55158, updating the NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0.

PR Revert "[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0" #903 reverted that change (commit cad1169: Revert [NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0 ([NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0 #800)).

The revert was then reverted (commit c28ce20), and this PR [NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0 #904 re-lands the change with a slightly different image tag (v0.16.0-cu130 instead of v0.16.0).

The diff shows line 941: pr-link: [NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0 #800 — this references the reverted PR.

Why existing code does not prevent it

There is no automated validation that pr-link values reference the current PR. The author likely copied the link from the original PR #800 submission without updating it for the re-landing PR.

Impact

Anyone following the changelog link to understand this change would land on PR #800, which is marked as reverted. This is confusing for traceability, though it does not affect any functional behavior. The convention throughout perf-changelog.yaml is that pr-link references the PR that actually lands the change.

Fix

Change line 941 from:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/800

to:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/904

Revert "Revert "[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.…

c28ce20

…1 to v0.16.0 (#800)" (#903) [skip-sweep]" This reverts commit cad1169.

cquil11 requested a review from a team March 10, 2026 16:52

cquil11 requested review from ankursingh-nv, jgangani and kedarpotdar-nv as code owners March 10, 2026 16:52

github-project-automation bot added this to InferenceMAX Board Mar 10, 2026

cquil11 added the sweep-enabled label Mar 10, 2026

cquil11 and others added 2 commits March 10, 2026 11:59

Fix H200 DGXC slurm GRES spec to include GPU type

1e991b0

The H200 cluster nodes register GPUs as gpu:nvidia_h200, not generic gpu, so salloc requires the full type in --gres. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "Fix H200 DGXC slurm GRES spec to include GPU type"

6c9a3b6

This reverts commit 1e991b0.

claude bot reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0#904

[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0#904
cquil11 wants to merge 3 commits intomainfrom
re-add-nvidia-gptoss-vllm-v0.16.0

cquil11 commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

cquil11 commented Mar 10, 2026

Uh oh!

claude bot left a comment

Uh oh!

claude bot Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cquil11 commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

cquil11 commented Mar 10, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude bot Mar 10, 2026

Choose a reason for hiding this comment

What the bug is

Step-by-step proof

Why existing code does not prevent it

Impact

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant