Question about adjusting Batch size and correct scaling of training

Hey everyone,

I wanted to retrain the model on multiple gpus and I was wondering whether it would make sense to increase the batch size. Since the "max_num_res_squared" parameter already assumes 48GB gpus, I could not increase this since I am using A100's with 40GB. However the reported batch size seems to be quite small due to the wandb reporting "train/batch_size 12.0". I was wondering whether this would affect the scaling behaviour because for now an epoch is not much faster when using 4 gpus in comparison to 1 and I was wondering what I did wrong. So I wanted to ask whether it would make sense to adjust the batch size and if so, what would be the best way to do this since I did not have seen an explicit parameter for this.

Thank you for your time and help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about adjusting Batch size and correct scaling of training #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about adjusting Batch size and correct scaling of training #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions