Skip to content

Conversation

@csukuangfj
Copy link
Collaborator

TODOs

  • Add tests and documentation to transformer.py and conformer.py; Fix its style issues.

@danpovey
Copy link
Collaborator

That was fast! Thanks!

)

# TODO: Use eos_id as ignore_id.
# tgt_key_padding_mask = decoder_padding_mask(ys_in_pad, ignore_id=eos_id)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is commented out since existing models are trained with it disabled.
If it is enabled, the WER becomes worse.
We should enable it when we start to train a new model.

@csukuangfj csukuangfj changed the title WIP: Refactoring Refactoring Aug 3, 2021
@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Aug 3, 2021

The following is the WER from the model trained by #3 and decoded with this pull-request:
(With n-gram LM rescoring and attention decoder. The model is trained for 26 epochs)

For test-clean, WER of different settings are:
ngram_lm_scale_0.7_attention_scale_0.6  2.96    best for test-clean
ngram_lm_scale_0.9_attention_scale_0.5  2.96
ngram_lm_scale_0.7_attention_scale_0.5  2.97
ngram_lm_scale_0.7_attention_scale_0.7  2.97
ngram_lm_scale_0.9_attention_scale_0.6  2.97
ngram_lm_scale_0.9_attention_scale_0.7  2.97
ngram_lm_scale_0.9_attention_scale_0.9  2.97
ngram_lm_scale_1.0_attention_scale_0.7  2.97
ngram_lm_scale_1.0_attention_scale_0.9  2.97
ngram_lm_scale_1.0_attention_scale_1.0  2.97
ngram_lm_scale_1.0_attention_scale_1.1  2.97
ngram_lm_scale_1.0_attention_scale_1.2  2.97
ngram_lm_scale_1.0_attention_scale_1.3  2.97
ngram_lm_scale_1.1_attention_scale_0.9  2.97

---

For test-other, WER of different settings are:
ngram_lm_scale_1.0_attention_scale_0.9  6.65    best for test-other
ngram_lm_scale_1.1_attention_scale_1.1  6.65
ngram_lm_scale_0.9_attention_scale_0.7  6.66
ngram_lm_scale_1.0_attention_scale_1.0  6.66
ngram_lm_scale_1.0_attention_scale_1.1  6.66
ngram_lm_scale_0.9_attention_scale_1.0  6.67
ngram_lm_scale_1.0_attention_scale_0.7  6.67
ngram_lm_scale_1.0_attention_scale_1.2  6.67
ngram_lm_scale_1.0_attention_scale_1.3  6.67
ngram_lm_scale_0.9_attention_scale_0.5  6.68
ngram_lm_scale_0.9_attention_scale_0.6  6.68
ngram_lm_scale_0.9_attention_scale_0.9  6.68
ngram_lm_scale_0.9_attention_scale_1.1  6.68
ngram_lm_scale_0.9_attention_scale_1.3  6.68
ngram_lm_scale_0.9_attention_scale_1.5  6.68

Epochs 14-26 are used in model averaging.


I have uploaded the above checkpoints to
https://huggingface.co/csukuangfj/conformer_ctc/tree/main

To reproduce the decoding result:

  1. clone the above repo containing checkpoints and put it into conformer_ctc/exp/
  2. after step 1, you should have conformer_ctc/exp/epoch-{14,15,...,26}.pt
  3. run
./prepare.sh
./conformer_ctc/decode.py --epoch 26 --avg 13 --max-duration=50
  1. You should get the above result.

The results are expected to become better if trained with more epochs.
I will rerun the training with the bug in k2-fsa/snowfall#242 fixed.

@danpovey
Copy link
Collaborator

danpovey commented Aug 3, 2021 via email

@pzelasko
Copy link
Collaborator

pzelasko commented Aug 3, 2021

Nice! I'm curious -- did you ever try to run the same thing but with MMI instead of CTC?

@csukuangfj
Copy link
Collaborator Author

Nice! I'm curious -- did you ever try to run the same thing but with MMI instead of CTC?

yes, I am planning to do that with a pretrained P. All the related code can be found in snowfall.

@csukuangfj
Copy link
Collaborator Author

Merging it to avoid conflicts.

@csukuangfj csukuangfj merged commit 5a0b9bc into k2-fsa:master Aug 4, 2021
@csukuangfj csukuangfj deleted the refactor branch August 4, 2021 06:53
@wwxm0523 wwxm0523 mentioned this pull request Jan 30, 2022
baileyeet referenced this pull request in reazon-research/icefall Jul 16, 2025
* Fix an error in TDNN-LSTM training.

* WIP: Refactoring

* Refactor transformer.py

* Remove unused code.

* Minor fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants