Skip to content

feat: support gpt-oss in vllm#259

Open
ztang2370 wants to merge 5 commits intoovg-project:mainfrom
ztang2370:feat/support-gpt-oss-in-vllm
Open

feat: support gpt-oss in vllm#259
ztang2370 wants to merge 5 commits intoovg-project:mainfrom
ztang2370:feat/support-gpt-oss-in-vllm

Conversation

@ztang2370
Copy link
Contributor

No description provided.

@jiarong0907
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for multiple KV cache groups in the vLLM integration, which is a key step for supporting hybrid models like gpt-oss. The changes correctly remove the single-group limitation and add validation for geometric compatibility across groups. Support for SlidingWindowSpec is also added. My review includes a couple of critical fixes to prevent potential IndexError exceptions when handling the list of KV cache groups, and a suggestion to improve code consistency.

@ztang2370 ztang2370 force-pushed the feat/support-gpt-oss-in-vllm branch from 067fe0a to 2e748d7 Compare March 17, 2026 15:10
@ztang2370
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for gpt-oss models in vLLM by enabling handling of multiple KV cache groups. The changes correctly generalize the logic for initialization, allocation, and management of KV caches to work with multiple groups. This includes updating how the number of layers is calculated, validating that all cache groups have compatible geometry, and correctly iterating over all groups and layers when setting up caches. I've found a couple of places where removing the restriction to a single KV cache group has also removed an implicit check for an empty list of groups. This could lead to an IndexError if the configuration provides an empty list of kv_cache_groups. I've added comments with suggestions to add explicit checks for this case to provide better error handling.

Comment on lines +258 to +268
for grp in kv_groups[1:]:
grp_spec = grp.kv_cache_spec
grp_block_size = grp_spec.block_size
grp_cell_size = grp_spec.page_size_bytes // grp_block_size // 2
if grp_block_size != block_size or grp_cell_size != cell_size:
raise ValueError(
"kvcached requires all KV cache groups to have the "
f"same block geometry. Group 0: block_size={block_size},"
f" cell_size={cell_size}; another group: "
f"block_size={grp_block_size}, cell_size={grp_cell_size}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While this new validation for multiple groups is great, a potential issue has been introduced by removing the if len(kv_groups) != 1: check. There is no longer a guard against kv_groups being empty. This can lead to an IndexError on line 251 (kv_cache_group = kv_groups[0]).

Please add a check for empty kv_groups after line 249, for example:

if not kv_groups:
    raise ValueError("kv_cache_config.kv_cache_groups cannot be empty when kvcached is enabled.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushing back as overly defensive. If kv_cache_groups were somehow empty, that would indicate a fundamental bug in vLLM's model initialization, and should surface loudly as an IndexError with a full traceback.

Comment on lines +597 to +603
for grp in kv_cache_config.kv_cache_groups:
grp_spec = grp.kv_cache_spec
if not isinstance(grp_spec, (FullAttentionSpec, SlidingWindowSpec)):
raise ValueError(
f"kvcached only supports FullAttentionSpec and SlidingWindowSpec layers, "
f"got {type(grp_spec).__name__}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to another part of this PR, removing the check for multiple kv_cache_groups also removes the implicit check for an empty list. If kv_cache_config.kv_cache_groups is empty, the access on line 605 (first_kv_cache_group = kv_cache_config.kv_cache_groups[0]) will raise an IndexError. It would be safer to add a check for an empty list before proceeding, for example by raising a ValueError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants