Skip to content

[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0#904

Open
cquil11 wants to merge 3 commits intomainfrom
re-add-nvidia-gptoss-vllm-v0.16.0
Open

[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0#904
cquil11 wants to merge 3 commits intomainfrom
re-add-nvidia-gptoss-vllm-v0.16.0

Conversation

@cquil11
Copy link
Collaborator

@cquil11 cquil11 commented Mar 10, 2026

Bump vllm/vllm-openai image tag for all 3 NVIDIA GPT-OSS configs (B200, H100, H200). All existing BKC flags preserved — no config changes beyond the image tag.

v0.16.0 notable changes for GPT-OSS/MXFP4:

  • Async scheduling + pipeline parallelism (30.8% throughput improvement)
  • New MXFP4 backends: SM90 FlashInfer BF16, SM100 CUTLASS
  • MoE cold start optimization
  • Triton backend now default non-FlashInfer fallback on SM90/SM100

Closes #798

…1 to v0.16.0 (#800)" (#903) [skip-sweep]"

This reverts commit cad1169.
@github-actions
Copy link
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

@cquil11
Copy link
Collaborator Author

cquil11 commented Mar 10, 2026

@ankursingh-nv re-trigger

cquil11 and others added 2 commits March 10, 2026 11:59
The H200 cluster nodes register GPUs as gpu:nvidia_h200, not generic
gpu, so salloc requires the full type in --gres.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward image tag bump from v0.15.1 to v0.16.0-cu130 across three NVIDIA GPT-OSS vLLM configs, with a minor nit on the changelog pr-link.

Extended reasoning...

Overview

This PR modifies two files: .github/configs/nvidia-master.yaml (updating the vllm/vllm-openai image tag from v0.15.1 to v0.16.0-cu130 in three places — B200, H100, and H200 GPT-OSS configs) and perf-changelog.yaml (adding a corresponding changelog entry). No config flags, parameters, or logic were changed beyond the image tag.

Security risks

None. This is a Docker image tag update in benchmark configuration files. No authentication, authorization, or data-handling code is involved.

Level of scrutiny

Low scrutiny is appropriate. This is a mechanical, pattern-following config change — identical in structure to dozens of prior changelog entries and image bumps visible in the same files. The PR re-lands a previously merged-then-reverted change (PR #800 → reverted by #903 → re-landed here as #904) with a slightly different image tag (v0.16.0-cu130 instead of v0.16.0).

Other factors

The one nit found (pr-link referencing #800 instead of #904) is valid but cosmetic — it affects changelog traceability, not functionality. The inline comment already flags this. The PR description is clear, the scope is minimal, and the change follows the established pattern throughout the repository.

- gptoss-fp4-h200-vllm
description:
- "Update vLLM image from v0.15.1 to v0.16.0-cu130 for NVIDIA GPT-OSS configs"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/800
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The pr-link for the new changelog entry points to PR #800, which was already merged and then reverted (PR #903). Since this PR #904 is the one actually re-landing the change, the link should be #904 for proper traceability.

Extended reasoning...

What the bug is

The new perf-changelog.yaml entry added at the bottom of the file has its pr-link set to #800. However, PR #800 was previously merged (commit da55158), then reverted by PR #903 (commit cad1169). This PR #904 is the one that actually re-lands the vLLM image update (now with the -cu130 suffix), so the changelog entry should reference #904.

Step-by-step proof

  1. PR [NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0 #800 was merged as commit da55158, updating the NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0.
  2. PR Revert "[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0" #903 reverted that change (commit cad1169: Revert [NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0 ([NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0 #800)).
  3. The revert was then reverted (commit c28ce20), and this PR [NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0 #904 re-lands the change with a slightly different image tag (v0.16.0-cu130 instead of v0.16.0).
  4. The diff shows line 941: pr-link: [NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0 #800 — this references the reverted PR.

Why existing code does not prevent it

There is no automated validation that pr-link values reference the current PR. The author likely copied the link from the original PR #800 submission without updating it for the re-landing PR.

Impact

Anyone following the changelog link to understand this change would land on PR #800, which is marked as reverted. This is confusing for traceability, though it does not affect any functional behavior. The convention throughout perf-changelog.yaml is that pr-link references the PR that actually lands the change.

Fix

Change line 941 from:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/800

to:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/904

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[NVIDIA] update H100, H200, B200 GPT OSS vLLM image to latest 0.16.0

1 participant