Skip to content

[WIP] [NV] Kimi fp4 configs#862

Open
ankursingh-nv wants to merge 5 commits intomainfrom
kimik2.5-fp4-b200-vllm
Open

[WIP] [NV] Kimi fp4 configs#862
ankursingh-nv wants to merge 5 commits intomainfrom
kimik2.5-fp4-b200-vllm

Conversation

@ankursingh-nv
Copy link
Collaborator

@ankursingh-nv ankursingh-nv commented Mar 3, 2026

Summary

Add Kimi K2.5 FP4 benchmark configuration for B200 using vLLM.

Changes

  • New config kimik2.5-fp4-b200-vllm in nvidia-master.yaml
    • Model: nvidia/Kimi-K2.5-NVFP4
    • Image: vllm/vllm-openai:v0.16.0
    • Parallelism: TP=8/EP=1 (conc 4-128) and TP=4/EP=4 (conc 4-64)
    • Sequence lengths: 1k1k, 1k8k, 8k1k
  • New benchmark script benchmarks/single_node/kimik2.5_fp4_b200.sh
    • Uses --reasoning-parser kimi_k2 and --tool-call-parser kimi_k2
    • Enables --compilation_config.pass_config.fuse_allreduce_rms true
    • Sets TORCH_CUDA_ARCH_LIST="10.0" for B200
    • GPU memory utilization: 0.90

@ankursingh-nv
Copy link
Collaborator Author

@claude update PR description and also add an entry to perf-changelog file.

@Klaud-Cold
Copy link
Collaborator

Klaud-Cold commented Mar 3, 2026

Claude finished @ankursingh-nv's task in 1m 27s —— View job


Completed

  • Gather context on PR changes
  • Update PR description
  • Add entry to perf-changelog.yaml
  • Commit and push changes

Done:

  • Updated PR description with a summary of config and script changes (model, image, parallelism, sequence lengths, key flags)
  • Added kimik2.5-fp4-b200-vllm entry to perf-changelog.yaml referencing this PR
    |

@functionstackx
Copy link
Contributor

@claude update the PR description to match the acutal vllm image being used

@Klaud-Cold
Copy link
Collaborator

Klaud-Cold commented Mar 3, 2026

Claude finished @functionstackx's task in 44s —— View job


Completed

  • Gather context on actual vLLM image used
  • Update PR description to match

Done: Updated the PR description to reflect the actual vLLM image vllm/vllm-openai:v0.16.0 (was incorrectly listed as v0.15.1).

@ankursingh-nv ankursingh-nv force-pushed the kimik2.5-fp4-b200-vllm branch from 2101d26 to 065f123 Compare March 4, 2026 19:03

kimik2.5-fp4-b200-vllm:
image: vllm/vllm-openai:v0.16.0
model: nvidia/Kimi-K2.5-NVFP4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR! overview LGTM!

@ankursingh-nv @kedarpotdar-nv 1 small thing is if you can add documentations about nvfp4 version of kimi k2.5 nvidia/Kimi-K2.5-NVFP4 to the vllm recipes https://github.com/vllm-project/recipes/blob/main/moonshotai/Kimi-K2.5.md . Lets ensure that the documentation is first class such that the entire ml community can benefit from your hard work!

+viz @faradawn

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

started PR here vllm-project/recipes#267

@ankursingh-nv ankursingh-nv changed the title [WIP] [NV] add kimi fp4 configs [WIP] [NV] Kimi fp4 configs Mar 5, 2026
@cquil11
Copy link
Collaborator

cquil11 commented Mar 9, 2026

@ankursingh-nv what is the hold up here? probably should update to vllm 0.17.0 before merging anyway since its up now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

5 participants