Skip to content

🐛 NaN Loss with DINOv3 Backbone During Training #103

@ecchi-cmd

Description

@ecchi-cmd

Hello,

First of all, thank you for this great contribution, the work is really impressive.

While training the model with the DINOv3 backbone, I encountered NaN values in the loss function, seemingly due to exploding gradients. The issue consistently appears after around 5 epochs.

I’ve already tried several stabilization techniques, including (but not limited to):

Adding residual connections

Gradient clipping and gradient penalties

Data normalization

Feature map normalization

Batch / Layer normalization

Despite these adjustments, the loss still diverges to NaN.

Could you please advise if there are specific training instructions or hyperparameter settings (especially for the loss function) that are recommended when using DINOv3?
Any insights on tuning or architecture-specific adjustments would be greatly appreciated.

Thank you again for your excellent work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions