[WIP] Add iterated loss #193

zhu-han · 2021-05-12T09:49:21Z

This PR implements iterated loss from #179 (comment).
Reference: https://arxiv.org/pdf/1910.10324.pdf

The following results could be reproduced with:

python mmi_att_transformer_train.py --world-size 2 --full-libri 0 --use-ali-model 0 --max-duration 250 --iterated-layers 5 --iterated-scale 0.3

Results with different iterated scale are shown in Table 1, it doesn't show clear improvement now.

iterated scale	test-clean	test-other	test-clean (rescore)	test-other (rescore)
-	6.74	17.18	5.63	14.92
-	6.78	17.49	5.76	15.31
0.01	6.71	17.34	5.6	14.86
0.05	6.58	17.35	5.69	15.06
0.30	6.57	17.6	5.61	15.38
1.00	6.77	17.69	5.8	15.58
10.00	6.93	18.31	5.88	16.22

The first two lines are baseline results with no iterated loss. I run it twice to see the randomness of results.

Details:

It adds an extra mmi loss after the 6th conformer layer. Also tried adding after both 4th and 8th layers, the results are similar, shown in Table 2.
The weight of the bigram lm in mmi loss is not updated using the extra mmi loss. The comparison with the other way is shown in Table 3.

iterated scale	test-clean	test-other	test-clean (rescore)	test-other (rescore)
-	6.74	17.18	5.63	14.92
-	6.78	17.49	5.76	15.31
0.30	6.61	17.65	5.65	15.52
1.00	6.66	18.13	5.68	15.87
10.00	6.75	18.43	5.74	16.25

model	test-clean	test-other	test-clean (rescore)	test-other (rescore)
-	6.57	17.6	5.61	15.38
+ update bigram with extra loss	6.75	17.77	5.69	15.53

Add iterated loss

43a7247