Skip to content

Latest commit

 

History

History
 
 

README.md

Week 03

Automatic Speech Recognition I

Slides

  • ASR I
    • Automatic speech recognition (ASR)
    • Metrics for ASR
    • CTC loss function
    • "Listen, Attend and Spell" archetecture
    • Beam-search

Prerequisites

Practice & homework

  • Seminar 1 Audio augmentations with code and examples
  • Seminar 2 Practical excersises
    • Writing and testing WER-metric
    • Implementing CTC decoding
    • Implementing CTC beam-search
  • (bonus!) Intro to PyCharm

Additional Materials

All links are provided on the last slide of the lecture

  1. wer are we - track global ASR progress on various datasets
  2. (paper) Librispeech: An ASR corpus based on public domain audio books (2015)
  3. (paper) Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks [2006]
  4. (blog post) Sequence Modeling With CTC
  5. (paper) Deep Speech: Scaling up end-to-end speech recognition (2014)
  6. (paper) Deep Speech 2: End-to-End Speech Recognition in English and Mandarin (2015)
  7. (blog post) Speech Recognition — Deep Speech, CTC, Listen, Attend, and Spell
  8. (paper) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (2015)
  9. (paper) Listen, Attend and Spell (2015)
  10. (paper) SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition (2019)
  11. (site) http://kaldi-asr.org/
  12. (paper) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention