Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for multiple KV cache groups in the vLLM integration, which is a key step for supporting hybrid models like gpt-oss. The changes correctly remove the single-group limitation and add validation for geometric compatibility across groups. Support for SlidingWindowSpec is also added. My review includes a couple of critical fixes to prevent potential IndexError exceptions when handling the list of KV cache groups, and a suggestion to improve code consistency.
067fe0a to
2e748d7
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for gpt-oss models in vLLM by enabling handling of multiple KV cache groups. The changes correctly generalize the logic for initialization, allocation, and management of KV caches to work with multiple groups. This includes updating how the number of layers is calculated, validating that all cache groups have compatible geometry, and correctly iterating over all groups and layers when setting up caches. I've found a couple of places where removing the restriction to a single KV cache group has also removed an implicit check for an empty list of groups. This could lead to an IndexError if the configuration provides an empty list of kv_cache_groups. I've added comments with suggestions to add explicit checks for this case to provide better error handling.
| for grp in kv_groups[1:]: | ||
| grp_spec = grp.kv_cache_spec | ||
| grp_block_size = grp_spec.block_size | ||
| grp_cell_size = grp_spec.page_size_bytes // grp_block_size // 2 | ||
| if grp_block_size != block_size or grp_cell_size != cell_size: | ||
| raise ValueError( | ||
| "kvcached requires all KV cache groups to have the " | ||
| f"same block geometry. Group 0: block_size={block_size}," | ||
| f" cell_size={cell_size}; another group: " | ||
| f"block_size={grp_block_size}, cell_size={grp_cell_size}" | ||
| ) |
There was a problem hiding this comment.
While this new validation for multiple groups is great, a potential issue has been introduced by removing the if len(kv_groups) != 1: check. There is no longer a guard against kv_groups being empty. This can lead to an IndexError on line 251 (kv_cache_group = kv_groups[0]).
Please add a check for empty kv_groups after line 249, for example:
if not kv_groups:
raise ValueError("kv_cache_config.kv_cache_groups cannot be empty when kvcached is enabled.")There was a problem hiding this comment.
Pushing back as overly defensive. If kv_cache_groups were somehow empty, that would indicate a fundamental bug in vLLM's model initialization, and should surface loudly as an IndexError with a full traceback.
| for grp in kv_cache_config.kv_cache_groups: | ||
| grp_spec = grp.kv_cache_spec | ||
| if not isinstance(grp_spec, (FullAttentionSpec, SlidingWindowSpec)): | ||
| raise ValueError( | ||
| f"kvcached only supports FullAttentionSpec and SlidingWindowSpec layers, " | ||
| f"got {type(grp_spec).__name__}" | ||
| ) |
There was a problem hiding this comment.
Similar to another part of this PR, removing the check for multiple kv_cache_groups also removes the implicit check for an empty list. If kv_cache_config.kv_cache_groups is empty, the access on line 605 (first_kv_cache_group = kv_cache_config.kv_cache_groups[0]) will raise an IndexError. It would be safer to add a check for an empty list before proceeding, for example by raising a ValueError.
No description provided.