Model test results - model 20240117

**Post test results + useful remarks here, ideally of both:** 
- the new model (20240117), and 
- the base model (kaldi_model_daanzu_20211030-mediumlm)

, using the same test data, and using the default Ready or Not grammar module.

**Useful remarks include:**
- specific words or phrases that were consistently misrecognised
- rate of false positives / false negatives
- subjective opinion, which model works better or worse, and in which areas

**Important instructions:**
- **It's important to manually review, clean and update `retain.tsv` with the correct rules + text**, see example workflow near the end of these instructions
- See this [YouTube video](https://youtu.be/VEtyMRIE2jA)
- Please only include "normal commands" in the test data, please exclude "Freeze", etc. 
    - There's a script `./scripts/copy_retain_item_cmds_only.ps1` that can be used in PowerShell to copy only "normal commands" out of `./retain/` and into `./cleanaudio_cmds/`
- Please, if possible, only use the default `_readyornot.py` grammar module, or very minor modifications, i.e. no new words.
- Example command to run test `./tacspeak.exe --test_model './cleanaudio_cmds/retain.tsv' './kaldi_model/' './kaldi_model/lexicon.txt' 4`
- There are a number of useful PowerShell scripts in the `./scripts/` folder related to cleaning up the retain.tsv and related .wav files. 
- A workflow I use for cleaning up the data after a play session:
    - Open `retain.tsv` and go through each line, reviewing the rule and text
    - At the same time, load into a playlist every .wav file in the `./retain/` folder in VLC media player on single file loop, pressing 'N' to move to next .wav as I read through each line of retain.tsv
    - When there's a mismatch between the text vs the audio, but the rule is correct, I correct the text in `retain.tsv` to align with the audio.
    - When there's a mismatch between the recognised rule (and/or option) vs the audio, I either A) update both the rule + text manually, or B) delete the line in `retain.tsv`, then when I'm done reviewing I run the `list_wav_missing_from_retain_tsv.ps1` first to make sure I'm deleting the right files, then run `delete_wav_missing_from_retain_tsv.ps1` script  (option A is preferred, but hey we're all busy and life is too short to spend cleaning all the data).
    - If the audio is so stupidly vague or garbled that I can't understand with my own ears and brain what I'm saying, I delete the line in `retain.tsv`, then when I'm done reviewing I run the `list_wav_missing_from_retain_tsv.ps1` first to make sure I'm deleting the right files, then run `delete_wav_missing_from_retain_tsv.ps1` script.

**Example report:**
- 0 incorrect commands out of 4 cmds (1 missions played), same result both models
- 5% WER, same result both models
- new model more often picks up baby crying as "freeze", using `"listen_key_toggle":-1`, using `USE_NOISE_SINK = True`; also picked up in base model but not as often.
- New model tended to pick up "red" as "gold" when wife was speaking
- using default `_readyornot.py` without any modifications
- './kaldi_model/' is new model
- './kaldi_model_base/' is base model

('./kaldi_model/', './retain/retain.tsv', 'Command', 'WER', 'Overall -> 5.00 %+/- 9.55 %N=20 C=19 S=1 D=0 I=0')
('./kaldi_model/', './retain/retain.tsv', 'Command', 'CMDERR', {'cmd_not_correct_output': 0, 'cmd_not_correct_rule': 0, 'cmd_not_correct_options': 0, 'cmd_not_recog_output': 0, 'cmd_not_recog_input': 0, 'cmds': 4})
('./kaldi_model_base/', './retain/retain.tsv', 'Command', 'WER', 'Overall -> 5.00 %+/- 9.55 %N=20 C=19 S=0 D=1 I=0')
('./kaldi_model_base/', './retain/retain.tsv', 'Command', 'CMDERR', {'cmd_not_correct_output': 0, 'cmd_not_correct_rule': 0, 'cmd_not_correct_options': 0, 'cmd_not_recog_output': 0, 'cmd_not_recog_input': 0, 'cmds': 4})


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Model test results - model 20240117 #23

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Model test results - model 20240117 #23

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions