Skip to content

Issue running iSogCLR with TempNet #3

@dhimitriosduka1

Description

@dhimitriosduka1

Dear Authors,

Thank you for sharing the repository. I’ve been attempting to replicate the iSogCLR and TempNet training pipeline, but I’ve encountered instability during training. Specifically, the issue arises after the first epoch: both the tempnet_loss and clip_loss spike and later become NaN.

I’m using the exact same training parameters as provided in the repository. The only notable differences are that I’m using WebDataset to load the CC3M dataset, and I'm using 4 instead of 8 GPUs. However, I doubt that any of them is the cause of the issue.

I’ve attached a graph illustrating the problem. Despite efforts to debug the implementation, I haven’t yet been able to identify the root cause.

Additionally, I’ve also tried running SogCLR on its own, but encountered similar training instability.

Any insights or suggestions would be greatly appreciated.

Best regards,
Dhimitrios Duka

temp_loss

clip_loss

isogclr tempnet r_mean

sogclr loss

sogcrl coco r_mean

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions