Could you give some guidance on how to train the model on multiple GPUs? I have a very large dataset and I run out of memory (32 GB total).
Should I use PyTorch's Distributed Data Parallel? or just assign different GPUs to different components of the model?
Thanks.