feat: support KV cache CPU offloading in vLLM by qinganrice · Pull Request #269 · ovg-project/kvcached

qinganrice · 2026-03-08T02:50:10Z

Summary:
This PR adds support for using vLLM CPU offloading with KVCached. KV cache blocks are moved between GPU and CPU so that repeated prompts can be served from CPU cache instead of recomputing.

Changes:
KVConnectorMixinPatch

Patches KVConnectorModelRunnerMixin.use_uniform_kv_cache() so it returns False when kvcached is enabled.
Reason: With OffloadingConnector, vLLM’s prefer_cross_layer_blocks=True makes use_uniform_kv_cache() return True, so vLLM uses allocate_uniform_kv_caches() and a single cross-layer torch.zeros tensor. That skips kvcached’s _allocate_kv_cache_tensors path and the VMM elastic pool.
By forcing use_uniform_kv_cache() to False, allocation goes through _allocate_kv_cache_tensors, so kvcached’s VMM-backed tensors are used and CPU offloading works.
Applied only for vLLM >=0.12.0 via @version_range(VLLM_V12_RANGE).

Test:

Runs successfully with vLLM 0.16.0.
GPU→CPU offload, CPU→GPU fetch (cache hit) and CPU eviction (LRU & ARC) are tested.

qinganrice · 2026-03-08T03:34:32Z

For SGLang's support, KVCached requires --disable-radix-cache but HiCache (extended RadixCache for CPU offloading) requires radix cache to track GPU↔CPU load/store per prefix tree node and SGLang forbids using both flags together. Therefore, we cannot support SGLang CPU offloading so far.

jiarong0907 · 2026-03-08T21:36:05Z

For SGLang's support, KVCached requires --disable-radix-cache but HiCache (extended RadixCache for CPU offloading) requires radix cache to track GPU↔CPU load/store per prefix tree node and SGLang forbids using both flags together. Therefore, we cannot support SGLang CPU offloading so far.

After support prefix cache, kvcached should no long require --disable-radix-cache.

feat: support KV cache CPU offloading in vLLM

4654169

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support KV cache CPU offloading in vLLM#269

feat: support KV cache CPU offloading in vLLM#269
qinganrice wants to merge 1 commit intoovg-project:mainfrom
qinganrice:feat/cpu_offloading

qinganrice commented Mar 8, 2026

Uh oh!

qinganrice commented Mar 8, 2026

Uh oh!

jiarong0907 commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qinganrice commented Mar 8, 2026

Uh oh!

qinganrice commented Mar 8, 2026

Uh oh!

jiarong0907 commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants