Skip to content

Gradient Skipping problem #25

@SerezD

Description

@SerezD

Hi,

I am trying to reproduce the model on CIFAR-10, training from scratch. However, I am encountering a problem with gradient skipping. I am setting threshold to 400 and grad norm value to 200 (which appears to be default from hps.py).

After a few iterations, it typically happens that the grad is something like 450, which triggers the skip update. At the following iteration, the grad will still have a similar value (e.g. 420), and skipping update is triggered again. The problem is that this process repeats for every batch, and the model stops updating forever!

Did you encounter this behavior in your training? Do you have any clues on how to solve it?

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions