-
-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Post test results + useful remarks here, ideally of both:
- the new model (20240117), and
- the base model (kaldi_model_daanzu_20211030-mediumlm)
, using the same test data, and using the default Ready or Not grammar module.
Useful remarks include:
- specific words or phrases that were consistently misrecognised
- rate of false positives / false negatives
- subjective opinion, which model works better or worse, and in which areas
Important instructions:
- It's important to manually review, clean and update
retain.tsvwith the correct rules + text, see example workflow near the end of these instructions - See this YouTube video
- Please only include "normal commands" in the test data, please exclude "Freeze", etc.
- There's a script
./scripts/copy_retain_item_cmds_only.ps1that can be used in PowerShell to copy only "normal commands" out of./retain/and into./cleanaudio_cmds/
- There's a script
- Please, if possible, only use the default
_readyornot.pygrammar module, or very minor modifications, i.e. no new words. - Example command to run test
./tacspeak.exe --test_model './cleanaudio_cmds/retain.tsv' './kaldi_model/' './kaldi_model/lexicon.txt' 4 - There are a number of useful PowerShell scripts in the
./scripts/folder related to cleaning up the retain.tsv and related .wav files. - A workflow I use for cleaning up the data after a play session:
- Open
retain.tsvand go through each line, reviewing the rule and text - At the same time, load into a playlist every .wav file in the
./retain/folder in VLC media player on single file loop, pressing 'N' to move to next .wav as I read through each line of retain.tsv - When there's a mismatch between the text vs the audio, but the rule is correct, I correct the text in
retain.tsvto align with the audio. - When there's a mismatch between the recognised rule (and/or option) vs the audio, I either A) update both the rule + text manually, or B) delete the line in
retain.tsv, then when I'm done reviewing I run thelist_wav_missing_from_retain_tsv.ps1first to make sure I'm deleting the right files, then rundelete_wav_missing_from_retain_tsv.ps1script (option A is preferred, but hey we're all busy and life is too short to spend cleaning all the data). - If the audio is so stupidly vague or garbled that I can't understand with my own ears and brain what I'm saying, I delete the line in
retain.tsv, then when I'm done reviewing I run thelist_wav_missing_from_retain_tsv.ps1first to make sure I'm deleting the right files, then rundelete_wav_missing_from_retain_tsv.ps1script.
- Open
Example report:
- 0 incorrect commands out of 4 cmds (1 missions played), same result both models
- 5% WER, same result both models
- new model more often picks up baby crying as "freeze", using
"listen_key_toggle":-1, usingUSE_NOISE_SINK = True; also picked up in base model but not as often. - New model tended to pick up "red" as "gold" when wife was speaking
- using default
_readyornot.pywithout any modifications - './kaldi_model/' is new model
- './kaldi_model_base/' is base model
('./kaldi_model/', './retain/retain.tsv', 'Command', 'WER', 'Overall -> 5.00 %+/- 9.55 %N=20 C=19 S=1 D=0 I=0')
('./kaldi_model/', './retain/retain.tsv', 'Command', 'CMDERR', {'cmd_not_correct_output': 0, 'cmd_not_correct_rule': 0, 'cmd_not_correct_options': 0, 'cmd_not_recog_output': 0, 'cmd_not_recog_input': 0, 'cmds': 4})
('./kaldi_model_base/', './retain/retain.tsv', 'Command', 'WER', 'Overall -> 5.00 %+/- 9.55 %N=20 C=19 S=0 D=1 I=0')
('./kaldi_model_base/', './retain/retain.tsv', 'Command', 'CMDERR', {'cmd_not_correct_output': 0, 'cmd_not_correct_rule': 0, 'cmd_not_correct_options': 0, 'cmd_not_recog_output': 0, 'cmd_not_recog_input': 0, 'cmds': 4})