[Fix] Fix SGLang tensor parallel rank bug and improve code quality by yurekami · Pull Request #226 · ovg-project/kvcached

yurekami · 2025-12-27T14:46:06Z

Summary

This PR fixes a critical bug in SGLang's tensor parallel support and improves overall code quality.

Bug Fix

Critical: Fix start_worker_listener_thread() call to use tp_rank instead of torch.cuda.current_device() in SGLang integration. The socket path is based on rank, not device ID, so using the wrong value causes IPC communication failures in multi-GPU tensor parallel setups.
Add is_worker parameter to SGLang's init_kvcached() to match vLLM's behavior - only workers should start the listener thread.
Add _tp_size global state to SGLang interfaces and pass it to KVCacheManager for proper tensor parallel coordination.
Update SGLang patches to detect tp_rank and tp_size from SGLang's distributed state via get_tp_group().

Code Quality Improvements

Replace bare except clauses in autopatch.py with specific exception handling (ImportError vs other exceptions) and add debug/warning logging.
Add comprehensive docstrings to all public API functions in both vLLM and SGLang integration modules.
Add type hints to global module state variables.
Add input validation for limit-percent CLI command (0-100 range).

Test plan

Verify Python syntax is valid (checked locally with py_compile)
Test with SGLang single-GPU inference
Test with SGLang tensor parallel (multi-GPU) inference
Test with vLLM to ensure no regressions

🤖 Generated with Claude Code

This PR fixes a critical bug in SGLang's tensor parallel support and improves overall code quality: ## Bug Fix - Fix `start_worker_listener_thread()` call to use `tp_rank` instead of `torch.cuda.current_device()` in SGLang integration. The socket path is based on rank, not device ID, so using the wrong value causes IPC communication failures in multi-GPU setups. - Add `is_worker` parameter to SGLang's `init_kvcached()` to match vLLM's behavior - only workers should start the listener thread. - Add `_tp_size` global state to SGLang interfaces and pass it to `KVCacheManager` for proper tensor parallel coordination. - Update SGLang patches to detect `tp_rank` and `tp_size` from SGLang's distributed state via `get_tp_group()`. ## Code Quality Improvements - Replace bare `except` clauses in `autopatch.py` with specific exception handling (ImportError vs other exceptions) and add debug/warning logging. - Add comprehensive docstrings to all public API functions in both vLLM and SGLang integration modules. - Add type hints to global module state variables. - Add input validation for `limit-percent` CLI command (0-100 range). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

rishabhsinha17 mentioned this pull request Jan 26, 2026

[Fix] Fix SGLang tensor parallel rank bug and TP IPC broadcast logic #234

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Fix SGLang tensor parallel rank bug and improve code quality#226

[Fix] Fix SGLang tensor parallel rank bug and improve code quality#226
yurekami wants to merge 1 commit intoovg-project:mainfrom
yurekami:fix/sglang-tp-rank-and-improvements

yurekami commented Dec 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yurekami commented Dec 27, 2025

Summary

Bug Fix

Code Quality Improvements

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant