Skip to content

training issues #12

@asmodaay

Description

@asmodaay

Hi, i tried to train model with only LJ data, and with only own data, with fp16 and with fr32, with 1 gpu and with 3 gpu, but everywhere i have this
Снимок экрана 2020-05-17 в 19 46 32
Always los is Nan.
When i start with pretrained chekpoint your code return this:
Снимок экрана 2020-05-17 в 19 52 44
I solve it by changing def load_checkpoint , but loss is nan(

do u have any ideas what am i doing wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions