Dataset Generation Question

First off, thanks for this awesome repo! Helping me a lot with my project!!!

Anyway, I'm a bit confused as to how the program is generating the samples that it does. For example, I chose a single wake word and generated a dataset from the speech commands dataset.  For the positive set, I get 

Generate training datasets: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 509/509 [01:03<0
"Number of speakers in corpus: 1, average number of utterances per speaker: 518.0."

However, when I follow the rest of the generation steps, I end up with a dataset of 10K examples. Im just a bit confused as to where these extra samples came from? Are they duplicates or some sort of augmented version of themselves? In the paper you mention- 
"For improved robustness and better quality, we implement a set of popular augmentation routines: time stretching, time shifting, synthetic noise addition, recorded noise mixing, SpecAugment (no time warping; Park et al., 2019), and vocal tract length perturbation (Jaitly and Hinton, 2013). These are readily extensible, so practitioners may easily add new augmentation modules."

I am mainly using this repo for dataset generation, so I wasn't sure if this was just talking about your model preprocessing, or if you perhaps implemented this in your dataset generation code.... I would dig through the code a bit more, but I figured it would be pretty quick/straightforward question for you guys and possibly be useful for someone else down the line.... 

Thanks, 
Brett

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset Generation Question #111

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dataset Generation Question #111

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions