feat: support mla model in vllm by ztang2370 · Pull Request #261 · ovg-project/kvcached

ztang2370 · 2026-02-28T08:44:49Z

Tested MLA model deepseek-ai/DeepSeek-V2-Lite on vllm 0.14.0 - 0.16.0.

cui36 · 2026-02-28T17:55:31Z

ivanium · 2026-03-08T21:59:16Z

Nice contribution! At a high level, I feel alloc_mla_kv_caches should be merged into alloc_kv_caches since their logic are largely similar and we have already had a attention_type field for this. wdyt?

ztang2370 · 2026-03-09T14:43:04Z

Nice contribution! At a high level, I feel alloc_mla_kv_caches should be merged into alloc_kv_caches since their logic are largely similar and we have already had a attention_type field for this. wdyt?

Agreed. Updated with alloc_mla_kv_cache merged into alloc_kv_cache and tested. I think the same should be applied to the sglang side cc @cui36 . I'll do the sglang part later in another PR.

ztang2370 · 2026-03-10T09:44:12Z

Hi @cui36 , I encountered problem setting up sglang with kvcached from source on the DGX Spark device, and therefore does not have device to test the refactor of sglang part for now (the smallest MLA model DeepSeek-V2-Lite takes more than 32GB). I'll leave it to you if you don't mind.
On DGX Spark now we can set up sglang only with Docker.

jiarong0907 · 2026-03-17T05:33:55Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for MLA models in vllm. The changes are mostly in kvcached/integration/vllm/interfaces.py and kvcached/integration/vllm/patches.py to handle the specifics of MLA, such as the combined KV buffer. The implementation is sound, but I've identified a couple of areas with code duplication that could be refactored to improve maintainability. My review includes suggestions for these refactorings.

kvcached/integration/vllm/interfaces.py

kvcached/integration/vllm/patches.py

gemini-code-assist bot reviewed Mar 17, 2026

View reviewed changes

kvcached/integration/vllm/interfaces.py Show resolved Hide resolved

kvcached/integration/vllm/patches.py Outdated Show resolved Hide resolved

ztang2370 added 3 commits March 17, 2026 23:31

feat: support mla model in vllm

3609c04

merge alloc_mla_kv_cache into alloc_kv_cache

07d5242

refactor

5e2bd4c

ztang2370 force-pushed the feat/support-mla-model-in-vllm branch from 3554ffc to 5e2bd4c Compare March 17, 2026 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support mla model in vllm#261

feat: support mla model in vllm#261
ztang2370 wants to merge 3 commits intoovg-project:mainfrom
ztang2370:feat/support-mla-model-in-vllm

ztang2370 commented Feb 28, 2026 •

edited

Loading

Uh oh!

cui36 commented Feb 28, 2026

Uh oh!

ivanium commented Mar 8, 2026

Uh oh!

ztang2370 commented Mar 9, 2026

Uh oh!

ztang2370 commented Mar 10, 2026

Uh oh!

jiarong0907 commented Mar 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ztang2370 commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cui36 commented Feb 28, 2026

Uh oh!

ivanium commented Mar 8, 2026

Uh oh!

ztang2370 commented Mar 9, 2026

Uh oh!

ztang2370 commented Mar 10, 2026

Uh oh!

jiarong0907 commented Mar 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ztang2370 commented Feb 28, 2026 •

edited

Loading