Gradient Skipping problem

Hi, 

I am trying to reproduce the model on CIFAR-10, training from scratch. However, I am encountering a problem with gradient skipping. I am setting threshold to 400 and grad norm value to 200 (which appears to be default from hps.py). 

After a few iterations, it typically happens that the grad is something like 450, which triggers the skip update. At the following iteration, the grad will still have a similar value (e.g. 420), and skipping update is triggered again. The problem is that this process repeats for every batch, and the model stops updating forever! 

Did you encounter this behavior in your training? Do you have any clues on how to solve it? 

Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gradient Skipping problem #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gradient Skipping problem #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions