Calculate batch norm statistic loss on parallel training

Hello, I have one question about batch norm statistic loss.

Consider parallel training. I have 8 GPUs. and 1 gpu can bear 128 batch size.

But you know, batch norm statistic loss is calculated on each machine and each machine share their gradients not whole batch(1024). And I think this can cause image quality degradation.

So, here is my question. How can I calculate batch norm statistic loss on parallel training just like calculating whole batch size not mini-batch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate batch norm statistic loss on parallel training #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Calculate batch norm statistic loss on parallel training #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions