Hey everyone,
I wanted to retrain the model on multiple gpus and I was wondering whether it would make sense to increase the batch size. Since the "max_num_res_squared" parameter already assumes 48GB gpus, I could not increase this since I am using A100's with 40GB. However the reported batch size seems to be quite small due to the wandb reporting "train/batch_size 12.0". I was wondering whether this would affect the scaling behaviour because for now an epoch is not much faster when using 4 gpus in comparison to 1 and I was wondering what I did wrong. So I wanted to ask whether it would make sense to adjust the batch size and if so, what would be the best way to do this since I did not have seen an explicit parameter for this.
Thank you for your time and help