This repository was archived by the owner on Mar 15, 2024. It is now read-only.

Description
|
seed = args.seed + utils.get_rank() |
In the issue
Why should we set different seed per gpu with DDP, the explanation is that the different seed contributes to the not same data-augmentations on different GPUs. However, I have another question. The different seeds on different GPUs also make different model weight initialization. I dont find the synchronous code like
torch.distributed.boardcast(). Is the different initilization helpful in distributed training process? Or, would you provide the synchronous code on model initilization?