Question about different seeds per gpu with DDP

https://github.com/facebookresearch/deit/blob/35cd4556c06929c59247b828d7dd778b44328fd4/main.py#L182
In the issue [Why should we set different seed per gpu with DDP](https://github.com/facebookresearch/deit/issues/150#issue-1158078425), the explanation is that the different seed contributes to the not same data-augmentations on different GPUs. However, I have another question. The different seeds on different GPUs also make different model weight initialization. I dont find the synchronous code like `torch.distributed.boardcast()`. Is the different initilization helpful in distributed training process? Or, would you provide the synchronous code on model initilization?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about different seeds per gpu with DDP #239

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about different seeds per gpu with DDP #239

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions