Skip to content

feat: support KV cache CPU offloading in vLLM#269

Open
qinganrice wants to merge 1 commit intoovg-project:mainfrom
qinganrice:feat/cpu_offloading
Open

feat: support KV cache CPU offloading in vLLM#269
qinganrice wants to merge 1 commit intoovg-project:mainfrom
qinganrice:feat/cpu_offloading

Conversation

@qinganrice
Copy link
Contributor

Summary:
This PR adds support for using vLLM CPU offloading with KVCached. KV cache blocks are moved between GPU and CPU so that repeated prompts can be served from CPU cache instead of recomputing.

Changes:
KVConnectorMixinPatch

  • Patches KVConnectorModelRunnerMixin.use_uniform_kv_cache() so it returns False when kvcached is enabled.
  • Reason: With OffloadingConnector, vLLM’s prefer_cross_layer_blocks=True makes use_uniform_kv_cache() return True, so vLLM uses allocate_uniform_kv_caches() and a single cross-layer torch.zeros tensor. That skips kvcached’s _allocate_kv_cache_tensors path and the VMM elastic pool.
  • By forcing use_uniform_kv_cache() to False, allocation goes through _allocate_kv_cache_tensors, so kvcached’s VMM-backed tensors are used and CPU offloading works.
  • Applied only for vLLM >=0.12.0 via @version_range(VLLM_V12_RANGE).

Test:

  • Runs successfully with vLLM 0.16.0.
  • GPU→CPU offload, CPU→GPU fetch (cache hit) and CPU eviction (LRU & ARC) are tested.

@qinganrice
Copy link
Contributor Author

For SGLang's support, KVCached requires --disable-radix-cache but HiCache (extended RadixCache for CPU offloading) requires radix cache to track GPU↔CPU load/store per prefix tree node and SGLang forbids using both flags together. Therefore, we cannot support SGLang CPU offloading so far.

@jiarong0907
Copy link
Collaborator

For SGLang's support, KVCached requires --disable-radix-cache but HiCache (extended RadixCache for CPU offloading) requires radix cache to track GPU↔CPU load/store per prefix tree node and SGLang forbids using both flags together. Therefore, we cannot support SGLang CPU offloading so far.

After support prefix cache, kvcached should no long require --disable-radix-cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants