-
Notifications
You must be signed in to change notification settings - Fork 31
Description
First off, thanks for this awesome repo! Helping me a lot with my project!!!
Anyway, I'm a bit confused as to how the program is generating the samples that it does. For example, I chose a single wake word and generated a dataset from the speech commands dataset. For the positive set, I get
Generate training datasets: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 509/509 [01:03<0
"Number of speakers in corpus: 1, average number of utterances per speaker: 518.0."
However, when I follow the rest of the generation steps, I end up with a dataset of 10K examples. Im just a bit confused as to where these extra samples came from? Are they duplicates or some sort of augmented version of themselves? In the paper you mention-
"For improved robustness and better quality, we implement a set of popular augmentation routines: time stretching, time shifting, synthetic noise addition, recorded noise mixing, SpecAugment (no time warping; Park et al., 2019), and vocal tract length perturbation (Jaitly and Hinton, 2013). These are readily extensible, so practitioners may easily add new augmentation modules."
I am mainly using this repo for dataset generation, so I wasn't sure if this was just talking about your model preprocessing, or if you perhaps implemented this in your dataset generation code.... I would dig through the code a bit more, but I figured it would be pretty quick/straightforward question for you guys and possibly be useful for someone else down the line....
Thanks,
Brett