Skip to content

Do these codes also fit for encoder-decoder transformer? #8

@SefaZeng

Description

@SefaZeng

I try to use TUPE in the NMT encoder, but got loss exploding error. Does it need some fix for TUPE to use in NMT?
the error is like:

2021-01-02 14:08:12 | INFO | train_inner | epoch 001:  12110 / 53999 loss=4.403, nll_loss=2.737, ppl=6.67, wps=46907.8, ups=1.07, wpb=43719.3, bsz=1694.4, num_updates=12100, lr=0.000325246, gnorm=0.245, loss_scale=4, train_wall=93, wall=0
2021-01-02 14:09:40 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 2.0
2021-01-02 14:09:41 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 1.0
2021-01-02 14:09:42 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.5
2021-01-02 14:09:43 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.25
2021-01-02 14:09:44 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.125
2021-01-02 14:09:45 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0625
2021-01-02 14:09:46 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.03125
2021-01-02 14:09:46 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.015625
2021-01-02 14:09:47 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125
2021-01-02 14:09:48 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.00390625
2021-01-02 14:09:49 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.001953125
2021-01-02 14:09:50 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0009765625
2021-01-02 14:09:51 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.00048828125
2021-01-02 14:09:52 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.000244140625
2021-01-02 14:09:53 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0001220703125
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.encoder.layers.0.self_attn.dropout_module, shape: torch.Size([2816, 34, 34]), forward input max: nan, input min: nan
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.encoder.layers.0.self_attn.dropout_module, shape: torch.Size([7296, 13, 13]), forward input max: nan, input min: nan
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.encoder.layers.0.self_attn.dropout_module, shape: torch.Size([2304, 38, 38]), forward input max: nan, input min: nan
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.encoder.layers.0.self_attn.dropout_module, shape: torch.Size([2816, 33, 33]), forward input max: nan, input min: nan
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.decoder.output_projection, shape: torch.Size([176, 23, 47038]), backward
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.decoder.output_projection, shape: torch.Size([144, 40, 47038]), backward
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.decoder.output_projection, shape: torch.Size([176, 31, 47038]), backward
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.decoder.output_projection, shape: torch.Size([456, 12, 47038]), backward
FloatingPointError: Minimum loss scale reached (0.0001). Your loss is probably exploding. Try lowering the learning rate, using gradient clipping or increasing the batch size.

Any help is appreciate! Thx.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions