[Fix] Fix SGLang tensor parallel rank bug and improve code quality#226
Open
yurekami wants to merge 1 commit intoovg-project:mainfrom
Open
[Fix] Fix SGLang tensor parallel rank bug and improve code quality#226yurekami wants to merge 1 commit intoovg-project:mainfrom
yurekami wants to merge 1 commit intoovg-project:mainfrom
Conversation
This PR fixes a critical bug in SGLang's tensor parallel support and improves overall code quality: ## Bug Fix - Fix `start_worker_listener_thread()` call to use `tp_rank` instead of `torch.cuda.current_device()` in SGLang integration. The socket path is based on rank, not device ID, so using the wrong value causes IPC communication failures in multi-GPU setups. - Add `is_worker` parameter to SGLang's `init_kvcached()` to match vLLM's behavior - only workers should start the listener thread. - Add `_tp_size` global state to SGLang interfaces and pass it to `KVCacheManager` for proper tensor parallel coordination. - Update SGLang patches to detect `tp_rank` and `tp_size` from SGLang's distributed state via `get_tp_group()`. ## Code Quality Improvements - Replace bare `except` clauses in `autopatch.py` with specific exception handling (ImportError vs other exceptions) and add debug/warning logging. - Add comprehensive docstrings to all public API functions in both vLLM and SGLang integration modules. - Add type hints to global module state variables. - Add input validation for `limit-percent` CLI command (0-100 range). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes a critical bug in SGLang's tensor parallel support and improves overall code quality.
Bug Fix
start_worker_listener_thread()call to usetp_rankinstead oftorch.cuda.current_device()in SGLang integration. The socket path is based on rank, not device ID, so using the wrong value causes IPC communication failures in multi-GPU tensor parallel setups.is_workerparameter to SGLang'sinit_kvcached()to match vLLM's behavior - only workers should start the listener thread._tp_sizeglobal state to SGLang interfaces and pass it toKVCacheManagerfor proper tensor parallel coordination.tp_rankandtp_sizefrom SGLang's distributed state viaget_tp_group().Code Quality Improvements
exceptclauses inautopatch.pywith specific exception handling (ImportErrorvs other exceptions) and add debug/warning logging.limit-percentCLI command (0-100 range).Test plan
py_compile)🤖 Generated with Claude Code