Skip to content
This repository was archived by the owner on Oct 13, 2022. It is now read-only.

Conversation

@zhu-han
Copy link
Contributor

@zhu-han zhu-han commented May 12, 2021

This PR implements iterated loss from #179 (comment).
Reference: https://arxiv.org/pdf/1910.10324.pdf

The following results could be reproduced with:

python mmi_att_transformer_train.py --world-size 2 --full-libri 0 --use-ali-model 0 --max-duration 250 --iterated-layers 5 --iterated-scale 0.3

Results with different iterated scale are shown in Table 1, it doesn't show clear improvement now.

  • Table 1
iterated scale test-clean test-other test-clean (rescore) test-other (rescore)
- 6.74 17.18 5.63 14.92
- 6.78 17.49 5.76 15.31
0.01 6.71 17.34 5.6 14.86
0.05 6.58 17.35 5.69 15.06
0.30 6.57 17.6 5.61 15.38
1.00 6.77 17.69 5.8 15.58
10.00 6.93 18.31 5.88 16.22

The first two lines are baseline results with no iterated loss. I run it twice to see the randomness of results.

Details:

  • It adds an extra mmi loss after the 6th conformer layer. Also tried adding after both 4th and 8th layers, the results are similar, shown in Table 2.
  • The weight of the bigram lm in mmi loss is not updated using the extra mmi loss. The comparison with the other way is shown in Table 3.

Extra results:

  • Table 2 (Add extra mmi losses after 4th and 8th layers)
iterated scale test-clean test-other test-clean (rescore) test-other (rescore)
- 6.74 17.18 5.63 14.92
- 6.78 17.49 5.76 15.31
0.30 6.61 17.65 5.65 15.52
1.00 6.66 18.13 5.68 15.87
10.00 6.75 18.43 5.74 16.25
  • Table 3 (Update bigram using extra mmi loss or not)
model test-clean test-other test-clean (rescore) test-other (rescore)
- 6.57 17.6 5.61 15.38
+ update bigram with extra loss 6.75 17.77 5.69 15.53

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant