Skip to content

Speed up model parallel initialization#1662

Draft
alexqdh wants to merge 2 commits intoNVIDIA:mainfrom
alexqdh:speed_up_mp_initialization
Draft

Speed up model parallel initialization#1662
alexqdh wants to merge 2 commits intoNVIDIA:mainfrom
alexqdh:speed_up_mp_initialization

Conversation

@alexqdh
Copy link

@alexqdh alexqdh commented Jul 2, 2025

The process groups are created on all ranks regardless of whether the current rank was actually part of that group. This is extremely time-consuming, especially during large-scale distributed training. By default, in torch groups should be created in the same order in all processes.

Parameter use_local_synchronization=True is added to all create_group() calls to enable local synchronization optimization, which can further improve performance during group creation. After modification, process groups are now only created when the current rank is actually part of the group (if rank in ranks:).

@Phlip79
Copy link
Member

Phlip79 commented Mar 4, 2026

We are changing our review process and marking all open, unlabeled PRs as draft. This change will go in effect starting once #3659 is merged.

Moving forward, all PRs will be required to start as draft PRs. If you wish to get your PR merged, mark your PR as “Ready for review”. Read more about the new process at submit.md.

@Phlip79 Phlip79 marked this pull request as draft March 4, 2026 23:00
@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Mar 4, 2026
@Phlip79 Phlip79 removed the needs-follow-up Issue needs follow-up label Mar 4, 2026
@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants