[WIP] [NV] Kimi fp4 configs by ankursingh-nv · Pull Request #862 · SemiAnalysisAI/InferenceX

ankursingh-nv · 2026-03-03T21:50:28Z

Summary

Add Kimi K2.5 FP4 benchmark configuration for B200 using vLLM.

Changes

New config kimik2.5-fp4-b200-vllm in nvidia-master.yaml
- Model: nvidia/Kimi-K2.5-NVFP4
- Image: vllm/vllm-openai:v0.16.0
- Parallelism: TP=8/EP=1 (conc 4-128) and TP=4/EP=4 (conc 4-64)
- Sequence lengths: 1k1k, 1k8k, 8k1k
New benchmark script benchmarks/single_node/kimik2.5_fp4_b200.sh
- Uses --reasoning-parser kimi_k2 and --tool-call-parser kimi_k2
- Enables --compilation_config.pass_config.fuse_allreduce_rms true
- Sets TORCH_CUDA_ARCH_LIST="10.0" for B200
- GPU memory utilization: 0.90

ankursingh-nv · 2026-03-03T21:51:30Z

@claude update PR description and also add an entry to perf-changelog file.

Klaud-Cold · 2026-03-03T21:51:57Z

Claude finished @ankursingh-nv's task in 1m 27s —— View job

Completed

Gather context on PR changes
Update PR description
Add entry to perf-changelog.yaml
Commit and push changes

Done:

Updated PR description with a summary of config and script changes (model, image, parallelism, sequence lengths, key flags)
Added kimik2.5-fp4-b200-vllm entry to perf-changelog.yaml referencing this PR
|

functionstackx · 2026-03-03T23:01:52Z

@claude update the PR description to match the acutal vllm image being used

Klaud-Cold · 2026-03-03T23:02:18Z

Claude finished @functionstackx's task in 44s —— View job

Completed

Gather context on actual vLLM image used
Update PR description to match

Done: Updated the PR description to reflect the actual vLLM image vllm/vllm-openai:v0.16.0 (was incorrectly listed as v0.15.1).

Co-authored-by: Ankur Singh <ankursingh-nv@users.noreply.github.com>

functionstackx · 2026-03-04T21:49:07Z

.github/configs/nvidia-master.yaml


+kimik2.5-fp4-b200-vllm:
+  image: vllm/vllm-openai:v0.16.0
+  model: nvidia/Kimi-K2.5-NVFP4


thanks for the PR! overview LGTM!

@ankursingh-nv @kedarpotdar-nv 1 small thing is if you can add documentations about nvfp4 version of kimi k2.5 nvidia/Kimi-K2.5-NVFP4 to the vllm recipes https://github.com/vllm-project/recipes/blob/main/moonshotai/Kimi-K2.5.md . Lets ensure that the documentation is first class such that the entire ml community can benefit from your hard work!

+viz @faradawn

started PR here vllm-project/recipes#267

cquil11 · 2026-03-09T17:56:58Z

@ankursingh-nv what is the hold up here? probably should update to vllm 0.17.0 before merging anyway since its up now

ankursingh-nv requested a review from a team March 3, 2026 21:50

ankursingh-nv requested review from jgangani and kedarpotdar-nv as code owners March 3, 2026 21:50

github-project-automation bot added this to InferenceMAX Board Mar 3, 2026

ankursingh-nv added the sweep-enabled label Mar 3, 2026

ankursingh-nv and others added 4 commits March 4, 2026 11:00

add kimi fp4 configs

e1d3b91

Add perf-changelog entry for Kimi K2.5 FP4 B200 vLLM config

8140f32

Co-authored-by: Ankur Singh <ankursingh-nv@users.noreply.github.com>

update vllm version

b140111

fix ep variable name

065f123

ankursingh-nv force-pushed the kimik2.5-fp4-b200-vllm branch from 2101d26 to 065f123 Compare March 4, 2026 19:03

fix ep

04107b3

functionstackx reviewed Mar 4, 2026

View reviewed changes

ankursingh-nv changed the title ~~[WIP] [NV] add kimi fp4 configs~~ [WIP] [NV] Kimi fp4 configs Mar 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [NV] Kimi fp4 configs#862

[WIP] [NV] Kimi fp4 configs#862
ankursingh-nv wants to merge 5 commits intomainfrom
kimik2.5-fp4-b200-vllm

ankursingh-nv commented Mar 3, 2026 •

edited by Klaud-Cold

Loading

Uh oh!

ankursingh-nv commented Mar 3, 2026

Uh oh!

Klaud-Cold commented Mar 3, 2026 •

edited

Loading

Uh oh!

functionstackx commented Mar 3, 2026

Uh oh!

Klaud-Cold commented Mar 3, 2026 •

edited

Loading

Uh oh!

functionstackx Mar 4, 2026

Uh oh!

kedarpotdar-nv Mar 4, 2026

Uh oh!

cquil11 commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ankursingh-nv commented Mar 3, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

ankursingh-nv commented Mar 3, 2026

Uh oh!

Klaud-Cold commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Completed

Uh oh!

functionstackx commented Mar 3, 2026

Uh oh!

Klaud-Cold commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Completed

Uh oh!

functionstackx Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

kedarpotdar-nv Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

cquil11 commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ankursingh-nv commented Mar 3, 2026 •

edited by Klaud-Cold

Loading

Klaud-Cold commented Mar 3, 2026 •

edited

Loading

Klaud-Cold commented Mar 3, 2026 •

edited

Loading