Skip to content
This repository was archived by the owner on Oct 13, 2022. It is now read-only.

Conversation

@glynpu
Copy link
Contributor

@glynpu glynpu commented Jun 21, 2021

This pr release a snowfall trained model together with related decode code.
Wer on test-clean is lower than previously trained model with snowfall, detailed comparison as following:

avg epoch 26-30 no rescore no rescore 4-gram lattice rescore 4-gram lattice rescore
  test-clean test-other test-clean test-other
before with LF-MMI loss 4.14 8.41 3.69 7.68
current 3.97 9.78 * *
INFO:root:[test-clean] %WER 3.97% [2087 / 52576, 220 ins, 166 del, 1701 sub ]
INFO:root:[test-other] %WER 9.78% [5121 / 52343, 535 ins, 439 del, 4147 sub ]

A thing worth to mention is that: current no-rescore result(3.97 on test-clean) is got WITHOUT a 3-gram.
Maybe the result will get lower with composing currnet ctc_topo with a 3-gram fst (I am working on this).

Another baseline of this model is an espnet released model, detailed comparison as following:
num_paths = 100 when doing n-best rescoring of row 2;
result of row 2 is got using similar techiniques used in #201, by loading espnet released model with snowfall code.

decoding algorithm training tool encoder + k2 ctc decode+no rescore encoder + k2 ctc decode+ decoder nbest rescore encoder + k2 ctc decode+ transformer lm nbest rescore encoder + k2 ctc decode+ decoder nbest rescore+ transformer lm nbest rescore
decoder algorithm in espnet espnet * * * 2.1%
k2 ctc decode in this pr espnet 2.97 2.64 2.43 2.35
k2 ctc decode in this pr snowfall 3.97 * * *

Conclusions:

  1. a better snowfall trained model is got before rescore.
  2. current training pipeline is still inferior to it's espnet counterpart; if fix this, current wer 3.97% on test-clean should close to 2.97%
    (related training code will be submited soon in this week; make a promise here to force me to do this quickly).

@danpovey
Copy link
Contributor

This was trained on 960 hours, I assume?
I'm surprised that your "k2 ctc decode" number was got without the 3-gram LM; I had thought you were using that.
What do you think are the differences between our trained model and the espnet-trained model? Is it possible to compare the diagnostics from training?

@glynpu
Copy link
Contributor Author

glynpu commented Jun 21, 2021

This was trained on 960 hours, I assume?

Yes, with full libri.

What do you think are the differences between our trained model and the espnet-trained model?

A known huge difference is espnet use warm_up scheduler, while I use Noam optimizer in snowfall.
with warmup_step = 40000, and model_size=512, at each step, learning rate of espnet is around 10 times that in this exp.
So I am going to retrain to the model with changing lr-factor from 1.0 to 10.0 after current exp finished.

Is it possible to compare the diagnostics from training?

Yes, I have reproduce espnet result and got detail training log, which will be used to diagnose my training process.

@pzelasko
Copy link
Collaborator

You might want to check what data augmentation techniques and settings they are using and compare them with our setup. If we’re missing some techniques in Lhotse we can add them.

@danpovey
Copy link
Contributor

So I guess this is ready to merge?

@glynpu
Copy link
Contributor Author

glynpu commented Jun 22, 2021

Maybe @csukuangfj is going to review this afternoon.

ys_out_pad = pad_list(ys_in, -1)

else:
raise VAlueError("Invalid input for decoder self attetion")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VAlueError -> ValueError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@csukuangfj
Copy link
Collaborator

+2

@csukuangfj
Copy link
Collaborator

Thanks! Merging

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants