[Bugfix][MultiConnector] Fix MultiConnector for SupportsHMA sub-connectors by ZhanqiuHu · Pull Request #36549 · vllm-project/vllm

ZhanqiuHu · 2026-03-09T21:15:16Z

Summary

MultiConnector.request_finished calls c.request_finished() on all sub-connectors, but NixlConnector (SupportsHMA) only overrides request_finished_all_groups. The call falls through to the base class no-op, so kv_transfer_params is never generated and NIXL transfers silently fail.

Fix: Check if sub-connector implements SupportsHMA and dispatch to request_finished_all_groups accordingly.

Test plan

See #36547 for reproduction steps, test script, and detailed output. Validated with a 2-GPU prefill/decode setup using Qwen/Qwen3-0.6B.

Test result (Are CI test cases needed for this?)

Before fix:

external_kv_transfer = 0 after proxy request (NIXL never fires)
All prompt tokens fall back to local_compute

After fix:

external_kv_transfer = 38 (all prompt tokens transferred via NIXL)
local_compute = 1 (only the recomputed last token)

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…ectors MultiConnector.request_finished calls c.request_finished() on all sub-connectors, but NixlConnector (SupportsHMA) only overrides request_finished_all_groups. The call falls through to the base class no-op, so kv_transfer_params is never generated and NIXL transfers silently fail. Check if sub-connector implements SupportsHMA and dispatch to request_finished_all_groups accordingly. Fixes vllm-project#36547 Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>

gemini-code-assist

Code Review

This pull request fixes a bug where MultiConnector failed to correctly handle SupportsHMA sub-connectors like NixlConnector, causing silent failures. The change correctly dispatches to request_finished_all_groups for HMA-aware connectors. However, this fix is incomplete for models with multiple KV cache groups, as MultiConnector does not implement the SupportsHMA interface and can only handle a single block group. I've left a critical comment with a suggestion for a more robust implementation to prevent potential data loss or resource leaks in multi-group scenarios.

gemini-code-assist · 2026-03-09T21:17:20Z

vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py

+            if isinstance(c, SupportsHMA):
+                async_save, txfer_params = c.request_finished_all_groups(
+                    request, (blocks,)
+                )
+            else:
+                async_save, txfer_params = c.request_finished(request, blocks)


This change correctly dispatches to request_finished_all_groups for SupportsHMA sub-connectors, but it only handles a single group of KV cache blocks. The request_finished method's signature (blocks: list[int]) means it can only receive one group.

For models with multiple KV cache groups (e.g., using sliding window attention), this implementation will only process one group. The KV cache for other groups will not be saved or handled by HMA-aware connectors, which can lead to silent data loss or resource leaks.

A more robust solution is for MultiConnector to implement the SupportsHMA interface. This would allow it to receive all block groups from the scheduler and delegate them correctly. This would involve:

Adding SupportsHMA to MultiConnector's base classes.

Implementing request_finished_all_groups in MultiConnector to handle tuple[list[int], ...], delegating to sub-connectors appropriately.

Updating request_finished to delegate to the new method for backward compatibility.

Here is an example of a more robust implementation:

# In vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py # Add SupportsHMA to class definition class MultiConnector(KVConnectorBase_V1, SupportsHMA): # ... (rest of the class) # Keep existing request_finished for backward compatibility def request_finished( self, request: "Request", blocks: list[int], ) -> tuple[bool, dict[str, Any] | None]: return self.request_finished_all_groups(request, (blocks,)) # Implement request_finished_all_groups def request_finished_all_groups( self, request: "Request", block_ids: tuple[list[int], ...], ) -> tuple[bool, dict[str, Any] | None]: async_saves = 0 kv_txfer_params = None for c in self._connectors: if isinstance(c, SupportsHMA): async_save, txfer_params = c.request_finished_all_groups( request, block_ids ) else: # For non-HMA connectors, pass only the first group of blocks. async_save, txfer_params = c.request_finished(request, block_ids[0] if block_ids else []) if async_save: async_saves += 1 if txfer_params is not None: if kv_txfer_params is not None: raise RuntimeError( "Only one connector can produce KV transfer params" ) kv_txfer_params = txfer_params if async_saves > 1: self._extra_async_saves[request.request_id] = async_saves - 1 # Clean up other state for this request. self._requests_to_connector.pop(request.request_id, None) return async_saves > 0, kv_txfer_params

Add request_finished shim to NixlConnector that wraps block_ids and delegates to request_finished_all_groups. This ensures MultiConnector (and any other caller) can use the standard request_finished interface. Reverts the MultiConnector isinstance check in favor of this approach. Fixes vllm-project#36547 Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>

ZhanqiuHu · 2026-03-10T16:07:18Z

Hi @NickLucche, I just updated the fix. Added a request_finished() funciton in nixl_connector.py.

ZhanqiuHu requested review from ApostaC, NickLucche and orozery as code owners March 9, 2026 21:15

ZhanqiuHu changed the title ~~[Bugfix] Fix MultiConnector.request_finished for SupportsHMA sub-connectors~~ [Bugfix][MultiConnector] Fix MultiConnector for SupportsHMA sub-connectors Mar 9, 2026

mergify bot added bug Something isn't working kv-connector labels Mar 9, 2026

gemini-code-assist bot reviewed Mar 9, 2026

View reviewed changes

ZhanqiuHu force-pushed the fix/multi-connector-hma-request-finished branch from b5972ff to 29820cc Compare March 10, 2026 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][MultiConnector] Fix MultiConnector for SupportsHMA sub-connectors#36549

[Bugfix][MultiConnector] Fix MultiConnector for SupportsHMA sub-connectors#36549
ZhanqiuHu wants to merge 2 commits intovllm-project:mainfrom
ZhanqiuHu:fix/multi-connector-hma-request-finished

ZhanqiuHu commented Mar 9, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 9, 2026

Uh oh!

ZhanqiuHu commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ZhanqiuHu commented Mar 9, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Test result (Are CI test cases needed for this?)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

ZhanqiuHu commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZhanqiuHu commented Mar 9, 2026 •

edited by github-actions bot

Loading