Skip to content

luisrodrruiz/Spoken-Command-Recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spoken-Command-Recognition

Lightweight model for spoken command recognition. The architecture is a CRNN model which can accurately recognizes short commands from wave files. The datasets for training and testing the models consists of image and csv files where the csv file must contain at least two columns "filename" and "label" where "filename" is a path to a wave file containing a single command utterance and "label" contains the transcription of the command.

Requirements

  • pytorch >= 2.0
  • torchaudio >= 2.0

The main script (train.py) takes the following parameters:

  • train_csv_file: csv file containing the trainining data
  • dev_csv_file: csv file containing the development (or validation) data

Optionally, the following parameters can be specified:

  • model: model to use ("crnn" or "rnn")
  • audio_path: path to be prepended to the path in "filename" column in the csv files (to convert relative paths in csv to absolute paths if needed)
  • out_dir: Output directory where the trained models will be saved

This model can be tested on the google command dataset:

https://www.kaggle.com/datasets/neehakurelli/google-speech-commands/data

About

Lightweight model for spoken command recognition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages