Speed up model parallel initialization by alexqdh · Pull Request #1662 · NVIDIA/Megatron-LM

alexqdh · 2025-07-02T15:42:55Z

The process groups are created on all ranks regardless of whether the current rank was actually part of that group. This is extremely time-consuming, especially during large-scale distributed training. By default, in torch groups should be created in the same order in all processes.

Parameter use_local_synchronization=True is added to all create_group() calls to enable local synchronization optimization, which can further improve performance during group creation. After modification, process groups are now only created when the current rank is actually part of the group (if rank in ranks:).

Phlip79 · 2026-03-04T23:00:36Z

We are changing our review process and marking all open, unlabeled PRs as draft. This change will go in effect starting once #3659 is merged.

Moving forward, all PRs will be required to start as draft PRs. If you wish to get your PR merged, mark your PR as “Ready for review”. Read more about the new process at submit.md.

qinduohao and others added 2 commits July 2, 2025 23:29

Speed up model parallel initialization

7b86527

Merge branch 'main' into speed_up_mp_initialization

8a5d9ed

sbhavani added the module: distributed label Jul 22, 2025

ko3n1g requested review from a team as code owners February 18, 2026 09:18

kvareddy approved these changes Feb 18, 2026

View reviewed changes

Phlip79 marked this pull request as draft March 4, 2026 23:00

github-actions bot added the community-request label Mar 4, 2026

chtruong814 added the needs-follow-up Issue needs follow-up label Mar 4, 2026

Phlip79 removed the needs-follow-up Issue needs follow-up label Mar 4, 2026

chtruong814 added the needs-follow-up Issue needs follow-up label Mar 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up model parallel initialization#1662

Speed up model parallel initialization#1662
alexqdh wants to merge 2 commits intoNVIDIA:mainfrom
alexqdh:speed_up_mp_initialization

alexqdh commented Jul 2, 2025

Uh oh!

Phlip79 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

alexqdh commented Jul 2, 2025

Uh oh!

Phlip79 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants