Skip to content

Feature/kimi k25 pp support#2

Open
chun-wan wants to merge 5 commits intoMHYangAMD:mainfrom
chun-wan:feature/kimi-k25-pp-support
Open

Feature/kimi k25 pp support#2
chun-wan wants to merge 5 commits intoMHYangAMD:mainfrom
chun-wan:feature/kimi-k25-pp-support

Conversation

@chun-wan
Copy link

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

chun-wan and others added 5 commits February 7, 2026 21:23
- Add INT4 W4A16 MoE config for E=384,N=128 (Kimi K2.5)
- Add FP8 W8A8 configs for AMD Instinct MI300X
- Update fused_moe_triton layer for W4A16 support
- Update compressed_tensors_moe for INT4 quantization
- Add HIP kernels for ROCm support
Feature: Support torch compile for Kimi-K2.5
Changes:
- kimi_k25.py: Add pp_proxy_tensors parameter to forward method and pass it to general_mm_embed_routine
- deepseek_v2.py: Fix device acquisition for non-first PP rank when input_embeds and input_ids are both None

Tested with TP=4, PP=2 configuration:
- Prefill test: 10 requests, 100% success
- Decode test: 20 requests, 100% success
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants