Skip to content

Question about adjusting Batch size and correct scaling of training #29

@JannikSchneider12

Description

@JannikSchneider12

Hey everyone,

I wanted to retrain the model on multiple gpus and I was wondering whether it would make sense to increase the batch size. Since the "max_num_res_squared" parameter already assumes 48GB gpus, I could not increase this since I am using A100's with 40GB. However the reported batch size seems to be quite small due to the wandb reporting "train/batch_size 12.0". I was wondering whether this would affect the scaling behaviour because for now an epoch is not much faster when using 4 gpus in comparison to 1 and I was wondering what I did wrong. So I wanted to ask whether it would make sense to adjust the batch size and if so, what would be the best way to do this since I did not have seen an explicit parameter for this.

Thank you for your time and help

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions