-
Notifications
You must be signed in to change notification settings - Fork 142
Description
Hello,
First of all, thank you for this great contribution, the work is really impressive.
While training the model with the DINOv3 backbone, I encountered NaN values in the loss function, seemingly due to exploding gradients. The issue consistently appears after around 5 epochs.
I’ve already tried several stabilization techniques, including (but not limited to):
Adding residual connections
Gradient clipping and gradient penalties
Data normalization
Feature map normalization
Batch / Layer normalization
Despite these adjustments, the loss still diverges to NaN.
Could you please advise if there are specific training instructions or hyperparameter settings (especially for the loss function) that are recommended when using DINOv3?
Any insights on tuning or architecture-specific adjustments would be greatly appreciated.
Thank you again for your excellent work!