-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Dear Authors,
Thank you for sharing the repository. I’ve been attempting to replicate the iSogCLR and TempNet training pipeline, but I’ve encountered instability during training. Specifically, the issue arises after the first epoch: both the tempnet_loss and clip_loss spike and later become NaN.
I’m using the exact same training parameters as provided in the repository. The only notable differences are that I’m using WebDataset to load the CC3M dataset, and I'm using 4 instead of 8 GPUs. However, I doubt that any of them is the cause of the issue.
I’ve attached a graph illustrating the problem. Despite efforts to debug the implementation, I haven’t yet been able to identify the root cause.
Additionally, I’ve also tried running SogCLR on its own, but encountered similar training instability.
Any insights or suggestions would be greatly appreciated.
Best regards,
Dhimitrios Duka




