Skip to content

[Issue #245] Add pipeline parallelism support#246

Open
iMmAseu wants to merge 11 commits intoovg-project:mainfrom
iMmAseu:feat-pp-support
Open

[Issue #245] Add pipeline parallelism support#246
iMmAseu wants to merge 11 commits intoovg-project:mainfrom
iMmAseu:feat-pp-support

Conversation

@iMmAseu
Copy link

@iMmAseu iMmAseu commented Feb 21, 2026

Summary

This PR fixes a TimeoutError during vLLM initialization when Pipeline Parallelism is enabled.

Fixes #243


Root Cause

KVCacheManager only used tensor_parallel_size, which caused incorrect process detection in PP setups.


Changes

  • Fix IPC synchronization using global world_size
  • Compute consistent global_rank
  • Rename tp_sizeworld_size
  • Update listener rank detection

Test Plan

  • Syntax check
  • Multi GPU inference (on Qwen3 8B)
  • PP inference
  • sgl and support for other version vllm
  • tp=2 and pp=2

@ivanium
Copy link
Collaborator

ivanium commented Feb 21, 2026

Just curious does this PR support enabling both TP and PP, say TP=2 and PP=2? In that case, how do we get the consistent global_rank here?

@cui36 cui36 self-requested a review February 22, 2026 00:37
@iMmAseu
Copy link
Author

iMmAseu commented Feb 22, 2026

@ivanium Thanks for pointing this out, your concern is absolutely valid. I’ve updated the implementation to correctly handle the TP and PP setup and compute a consistent global_rank across all processes. I’ve also run tests with configurations like TP=2 and PP=2, and the initialization now works seemly as expected without synchronization issues.

@cui36
Copy link
Collaborator

cui36 commented Feb 22, 2026

Hi @iMmAseu, thanks for the update! Let's split it into two parts, and I will go through the vllm patch.

@cui36
Copy link
Collaborator

cui36 commented Feb 24, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for pipeline parallelism by refactoring the handling of distributed process ranks. The changes primarily involve renaming tp_size to world_size and adding pp_rank to correctly manage inter-process communication for KV cache synchronization across different pipeline stages. The implementation correctly namespaces IPC sockets by pp_rank to prevent conflicts.

My review has identified a few areas for improvement:

  • There is some duplicated code for resolving distributed ranks in the SGLang integration patches.
  • The use of broad except Exception clauses could mask unexpected errors.
  • The docstrings for the world_size parameter are inconsistent with its actual usage, which could lead to confusion.

Overall, the changes are well-structured and address the issue described. Addressing the feedback will improve the code's maintainability and clarity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants