🐛 NaN Loss with DINOv3 Backbone During Training

Hello,

First of all, thank you for this great contribution, the work is really impressive.

While training the model with the DINOv3 backbone, I encountered NaN values in the loss function, seemingly due to exploding gradients. The issue consistently appears after around 5 epochs.

I’ve already tried several stabilization techniques, including (but not limited to):

Adding residual connections

Gradient clipping and gradient penalties

Data normalization

Feature map normalization

Batch / Layer normalization

Despite these adjustments, the loss still diverges to NaN.

Could you please advise if there are specific training instructions or hyperparameter settings (especially for the loss function) that are recommended when using DINOv3?
Any insights on tuning or architecture-specific adjustments would be greatly appreciated.

Thank you again for your excellent work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 NaN Loss with DINOv3 Backbone During Training #103

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🐛 NaN Loss with DINOv3 Backbone During Training #103

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions