CUDA_ERROR_OUT_OF_MEMORY: out of memory / on MSN-Hard training

Hello, thank you for reproducing the work of the paper.

I tried to launch the training of the model on the MSN-Hard dataset, but I'm unable to launch the training because a CUDA_ERROR_OUT_OF_MEMORY error that I get at the beginning of the training. 

The training is launched on a cluster and more specifically on a node containing 8 * GPU V100 (32Go each)
I further reduced the batch size to 32 and the sampled points to 2300, but it do not seems to suffice to avoid the error. 

Here is the command that I use to launch the code (inside a .slurm file containing the necessary parameters) :
torchrun --standalone --nnodes 1 --nproc_per_node 8 train.py runs/msn/osrt/config.yaml

I have to point out that I was able to make the model train on the CLEVR3D dataset successfully following the same procedure. 

To load the dataset, I downloaded the following : https://console.cloud.google.com/storage/browser/kubric-public/tfds/kubric_frames/multi_shapenet_conditional, putting the 2.8.0/ folder inside data/osrt/multi_shapenet_frames/ folder of the project.

Did you ever encounter this issue ? 

Thank you


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA_ERROR_OUT_OF_MEMORY: out of memory / on MSN-Hard training #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

CUDA_ERROR_OUT_OF_MEMORY: out of memory / on MSN-Hard training #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions