Name	Name	Last commit message	Last commit date
parent directory ..
LJ001-0001.wav	LJ001-0001.wav
README.md	README.md
h001_Bedroom_65.wav	h001_Bedroom_65.wav
lj_batch.pickle	lj_batch.pickle
mystery_records.pickle	mystery_records.pickle
requirements.txt	requirements.txt
seminar03_1.ipynb	seminar03_1.ipynb
seminar03_2.ipynb	seminar03_2.ipynb

Name

Last commit message

Last commit date

LJ001-0001.wav

README.md

h001_Bedroom_65.wav

lj_batch.pickle

mystery_records.pickle

requirements.txt

seminar03_1.ipynb

seminar03_2.ipynb

Week 03

Automatic Speech Recognition I

Slides

ASR I
- Automatic speech recognition (ASR)
- Metrics for ASR
- CTC loss function
- "Listen, Attend and Spell" archetecture
- Beam-search

Prerequisites

This lecture assumes you are familiar with the attention mechanism. If you're not feeling confident about this topic, we recommend that you read one of the following materials:
- How Attention works in Deep Learning: understanding the attention mechanism in sequence models
  - Simple language. Easy to understand. Quite detailed. Not too technical. Surface knowledge.
- Sequence to Sequence (seq2seq) and Attention
  - Nicely illustrated. Detailed math explanations. A bit bulk.

Practice & homework

Seminar 1 Audio augmentations with code and examples
Seminar 2 Practical excersises
- Writing and testing WER-metric
- Implementing CTC decoding
- Implementing CTC beam-search
(bonus!) Intro to PyCharm

Additional Materials

All links are provided on the last slide of the lecture

wer are we - track global ASR progress on various datasets
(paper) Librispeech: An ASR corpus based on public domain audio books (2015)
(paper) Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks [2006]
(blog post) Sequence Modeling With CTC
(paper) Deep Speech: Scaling up end-to-end speech recognition (2014)
(paper) Deep Speech 2: End-to-End Speech Recognition in English and Mandarin (2015)
(blog post) Speech Recognition — Deep Speech, CTC, Listen, Attend, and Spell
(paper) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (2015)
(paper) Listen, Attend and Spell (2015)
(paper) SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition (2019)
(site) http://kaldi-asr.org/
(paper) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Week 03

Automatic Speech Recognition I

Slides

Prerequisites

Practice & homework

Additional Materials

FilesExpand file tree

week03

Directory actions

More options

Directory actions

More options

Latest commit

History

week03

Folders and files

parent directory

README.md

Week 03

Automatic Speech Recognition I

Slides

Prerequisites

Practice & homework

Additional Materials