-
Notifications
You must be signed in to change notification settings - Fork 25
Description
train.conllu.txt
tune.conllu.txt
While tuning some of the parameters across multiple treebanks (1296 total runs), I got this error for about 60 of the runs:
File "uuparser/parser.py", line 325, in main
run(experiment,options)
File "uuparser/parser.py", line 51, in run
parser.Train(traindata,options)
File "/data/rob/datasplits/uuparser/uuparser/arc_hybrid.py", line 448, in Train
self.trainer.update()
File "_dynet.pyx", line 6198, in _dynet.Trainer.update
File "_dynet.pyx", line 6203, in _dynet.Trainer.update
RuntimeError: Magnitude of gradient is bad: inf
After running the exact same commands one more time, I still have the error for 24 of these 60 runs. I could not find a clear trend on why/when this happens, and it happened on two different machines.
One example where this occurs is the following command:
python3 uuparser/parser.py --trainfile ../ --devfile ../newsplits-v2.7/UD_Arabic-PADT/tune.conllu --learning-rate 0.01 --word-emb-size 100 --char-emb-size 500 --no-bilstms 1 --outdir models/UD_Arabic-PADT.0.0.01.100.500.1
I have attached the used files, I get the error at epoch 11.
PS. This can be solved by continuing training after the crash using --continue (which I have now done for my own experiments)