-
Notifications
You must be signed in to change notification settings - Fork 91
Open
Description
Hi,
I am trying to reproduce the model on CIFAR-10, training from scratch. However, I am encountering a problem with gradient skipping. I am setting threshold to 400 and grad norm value to 200 (which appears to be default from hps.py).
After a few iterations, it typically happens that the grad is something like 450, which triggers the skip update. At the following iteration, the grad will still have a similar value (e.g. 420), and skipping update is triggered again. The problem is that this process repeats for every batch, and the model stops updating forever!
Did you encounter this behavior in your training? Do you have any clues on how to solve it?
Thank you
Metadata
Metadata
Assignees
Labels
No labels