Conversation
|
Thanks @ztang2370! |
|
Nice contribution! At a high level, I feel |
Agreed. Updated with |
|
Hi @cui36 , I encountered problem setting up sglang with kvcached from source on the DGX Spark device, and therefore does not have device to test the refactor of sglang part for now (the smallest MLA model DeepSeek-V2-Lite takes more than 32GB). I'll leave it to you if you don't mind. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for MLA models in vllm. The changes are mostly in kvcached/integration/vllm/interfaces.py and kvcached/integration/vllm/patches.py to handle the specifics of MLA, such as the combined KV buffer. The implementation is sound, but I've identified a couple of areas with code duplication that could be refactored to improve maintainability. My review includes suggestions for these refactorings.
3554ffc to
5e2bd4c
Compare
Tested MLA model
deepseek-ai/DeepSeek-V2-Liteon vllm 0.14.0 - 0.16.0.