GoVivace · saikiranvalluri · Jan 18, 2019 · Jan 21, 2019 · Jan 21, 2019 · Jan 22, 2019
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,18 @@
+---
+name: Bug report
+about: Create a report to help us improve
+title: ''
+labels: bug
+assignees: ''
+
+---
+
+<!--
+    WARNING: THE KALDI ISSUE TRACKER IS **ONLY** USED FOR KALDI DEVELOPMENT!
+
+    If you have a question about using Kaldi, please use the kald-help discussion group:
+
+    https://groups.google.com/forum/#!forum/kaldi-help
+
+    Instructions for joining are available at: http://kaldi-asr.org/forums.html
+-->
diff --git a/.github/ISSUE_TEMPLATE/feature-proposal-discussion.md b/.github/ISSUE_TEMPLATE/feature-proposal-discussion.md
@@ -0,0 +1,18 @@
+---
+name: Feature proposal or discussion
+about: Suggest an idea for Kaldi
+title: ''
+labels: discussion
+assignees: ''
+
+---
+
+<!--
+    WARNING: THE KALDI ISSUE TRACKER IS **ONLY** USED FOR KALDI DEVELOPMENT!
+
+    If you have a question about using Kaldi, please use the kald-help discussion group:
+
+    https://groups.google.com/forum/#!forum/kaldi-help
+
+    Instructions for joining are available at: http://kaldi-asr.org/forums.html
+-->
diff --git a/.gitignore b/.gitignore
@@ -83,6 +83,7 @@ GSYMS
 /tools/ATLAS/
 /tools/atlas3.8.3.tar.gz
 /tools/irstlm/
+/tools/mitlm/
 /tools/openfst
 /tools/openfst-1.3.2.tar.gz
 /tools/openfst-1.3.2/
@@ -147,3 +148,4 @@ GSYMS
 /tools/cub-1.8.0.zip
 /tools/cub-1.8.0/
 /tools/cub
+/tools/python/
diff --git a/docker/README.md b/docker/README.md
@@ -0,0 +1,30 @@
+# Kaldi Docker images
+
+Kaldi offers two set of images: CPU-based images and GPU-based images. Daily builds of the latest version of the master branch (both CPU and GPU images) are pushed daily to [DockerHub](https://hub.docker.com/r/kaldiasr/kaldi). 
+
+## Using pre-built images 
+Sample usage of the CPU based images:
+```bash
+docker run -it kaldiasr/kaldi:latest bash
+``` 
+
+Sample usage of the GPU based images:
+
+Note: use [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) to run the GPU images.
+
+```bash
+docker run -it --runtime=nvidia kaldiasr/kaldi:gpu-latest bash
+```
+
+## Building images locally
+For building the CPU-based image:
+```bash
+cd docker/debian9.8-cpu
+docker build --tag kaldiasr/kaldi:latest .
+```
+
+and for GPU-based image:
+```bash
+cd docker/ubuntu16.04-gpu
+docker build --tag kaldiasr/kaldi:gpu-latest .
+```
diff --git a/docker/debian9.8-cpu/Dockerfile b/docker/debian9.8-cpu/Dockerfile
@@ -0,0 +1,40 @@
+
+FROM debian:9.8
+LABEL maintainer="mdoulaty@gmail.com"
+
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+        g++ \
+        make \
+        automake \
+        autoconf \
+        bzip2 \
+        unzip \
+        wget \
+        sox \
+        libtool \
+        git \
+        subversion \
+        python2.7 \
+        python3 \
+        zlib1g-dev \
+        ca-certificates \
+        patch \
+        ffmpeg \
+	vim && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN ln -s /usr/bin/python2.7 /usr/bin/python 
+
+RUN git clone --depth 1 https://github.com/kaldi-asr/kaldi.git /opt/kaldi && \
+    cd /opt/kaldi && \
+    cd /opt/kaldi/tools && \
+    ./extras/install_mkl.sh && \
+    make -j $(nproc) && \
+    cd /opt/kaldi/src && \
+    ./configure --shared && \
+    make depend -j $(nproc) && \
+    make -j $(nproc)
+
+WORKDIR /opt/kaldi/
+
diff --git a/docker/ubuntu16.04-gpu/Dockerfile b/docker/ubuntu16.04-gpu/Dockerfile
@@ -0,0 +1,40 @@
+
+FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04
+LABEL maintainer="mdoulaty@gmail.com"
+
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+        g++ \
+        make \
+        automake \
+        autoconf \
+        bzip2 \
+        unzip \
+        wget \
+        sox \
+        libtool \
+        git \
+        subversion \
+        python2.7 \
+        python3 \
+        zlib1g-dev \
+        ca-certificates \
+        patch \
+        ffmpeg \
+	vim && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN ln -s /usr/bin/python2.7 /usr/bin/python 
+
+RUN git clone --depth 1 https://github.com/kaldi-asr/kaldi.git /opt/kaldi && \
+    cd /opt/kaldi && \
+    cd /opt/kaldi/tools && \
+    ./extras/install_mkl.sh && \
+    make -j $(nproc) && \
+    cd /opt/kaldi/src && \
+    ./configure --shared --use-cuda && \
+    make depend -j $(nproc) && \
+    make -j $(nproc)
+
+WORKDIR /opt/kaldi/
+
diff --git a/egs/aidatatang_200zh/README.md b/egs/aidatatang_200zh/README.md
@@ -0,0 +1,21 @@
+Aidatatang_200zh is a free Chinese Mandarin speech corpus provided by Beijing DataTang Technology Co., Ltd under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License. 
+
+**About the aidatatang_200zh corpus:**
+
+- The corpus contains 200 hours of acoustic data, which is mostly mobile recorded data.
+- 600 speakers from different accent areas in China are invited to participate in the recording.
+- The transcription accuracy for each sentence is larger than 98%.
+- Recordings are conducted in a quiet indoor environment. 
+- The database is divided into training set, validation set, and testing set in a ratio of 7: 1: 2.
+- Detail information such as speech data coding and speaker information is preserved in the metadata file.
+- Segmented transcripts are also provided.
+
+You can get the corpus from [here](https://www.datatang.com/webfront/opensource.html). 
+
+DataTang is a community of creators-of world-changers and future-builders. We're invested in collaborating with a diverse set of voices in the AI world, and are excited about working on large-scale projects. Beyond speech, we're providing multiple resources in image, and text. For more details, please visit [datatang](<https://www.datatang.com/>).
+
+**About the recipe:**
+
+To demonstrate that this corpus is a reasonable data resource for Chinese Mandarin speech recognition research, a baseline recipe is provided here for everyone to explore their own systems easily and quickly.
+
+In this directory, each subdirectory contains the scripts for a sequence of experiments. The recipe in subdirectory "s5" is based on the hkust s5 recipe and aishell s5 recipe. It generates an integrated phonetic lexicon with CMU dictionary and cedit dictionary. This recipe follows the Mono+Triphone+SAT+fMLLR+DNN pipeline. In addition, this directory will be extended as scripts for speaker diarization and so on are created.
diff --git a/egs/aidatatang_200zh/s5/RESULTS b/egs/aidatatang_200zh/s5/RESULTS
@@ -0,0 +1,17 @@
+%WER 37.09 [ 173936 / 468933, 4868 ins, 31143 del, 137925 sub ] exp/mono/decode_test/cer_10_0.0
+%WER 17.98 [ 84305 / 468933, 4724 ins, 12637 del, 66944 sub ] exp/tri1/decode_test/cer_13_0.0
+%WER 17.94 [ 84149 / 468933, 5025 ins, 12427 del, 66697 sub ] exp/tri2/decode_test/cer_13_0.0
+%WER 17.26 [ 80945 / 468933, 4421 ins, 12958 del, 63566 sub ] exp/tri3a/decode_test/cer_14_0.0
+%WER 14.16 [ 66424 / 468933, 4567 ins, 10224 del, 51633 sub ] exp/tri4a/decode_test/cer_14_0.0
+%WER 12.22 [ 57304 / 468933, 4799 ins, 8197 del, 44308 sub ] exp/tri5a/decode_test/cer_14_0.0
+%WER 5.59 [ 26232 / 468933, 1701 ins, 4377 del, 20154 sub ] exp/chain/tdnn_1a_sp/decode_test/cer_10_0.0
+
+# nnet3 tdnn with online pitch, local/nnet3/tuning/run_tdnn_2a.sh
+%WER 7.21 [ 33797 / 468933, 2141 ins, 6117 del, 25539 sub ] exp/nnet3/tdnn_sp/decode_test/cer_13_0.0
+%WER 7.44 [ 34878 / 468933, 2252 ins, 5854 del, 26772 sub ] exp/nnet3/tdnn_sp_online/decode_test/cer_12_0.0
+%WER 7.79 [ 36542 / 468933, 2527 ins, 5674 del, 28341 sub ] exp/nnet3/tdnn_sp_online/decode_test_per_utt/cer_12_0.0
+
+# chain with online pitch, local/chain/tuning/run_tdnn_2a.sh
+%WER 5.61 [ 26311 / 468933, 1773 ins, 4789 del, 19749 sub ] exp/chain/tdnn_2a_sp/decode_test/cer_11_0.0
+%WER 5.69 [ 26661 / 468933, 1723 ins, 4724 del, 20214 sub ] exp/chain/tdnn_2a_sp_online/decode_test/cer_11_0.0
+%WER 5.98 [ 28046 / 468933, 2031 ins, 4527 del, 21488 sub ] exp/chain/tdnn_2a_sp_online/decode_test_per_utt/cer_11_0.0
diff --git a/egs/aidatatang_200zh/s5/cmd.sh b/egs/aidatatang_200zh/s5/cmd.sh
@@ -0,0 +1,14 @@
+# you can change cmd.sh depending on what type of queue you are using.
+# If you have no queueing system and want to run on a local machine, you
+# can change all instances 'queue.pl' to run.pl (but be careful and run
+# commands one by one: most recipes will exhaust the memory on your
+# machine).  queue.pl works with GridEngine (qsub).  slurm.pl works
+# with slurm.  Different queues are configured differently, with different
+# queue names and different ways of specifying things like memory;
+# to account for these differences you can create and edit the file
+# conf/queue.conf to match your queue's configuration.  Search for
+# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information,
+# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl.
+
+export train_cmd="queue.pl --mem 2G"
+export decode_cmd="queue.pl --mem 4G"
diff --git a/egs/aidatatang_200zh/s5/conf/cmu2pinyin b/egs/aidatatang_200zh/s5/conf/cmu2pinyin
@@ -0,0 +1,39 @@
+AA A
+AE A
+AH A
+AO UO
+AW U
+AY AI
+B B
+CH CH 
+D D
+DH S I
+EH AI
+ER E
+EY AI
+F F
+G G
+HH H
+IH I
+IY I
+JH ZH 
+K K
+L L
+M M
+N N
+NG N
+OW UO
+OY UO
+P P
+R R
+S S
+SH SH
+T T
+TH S
+UH U
+UW U
+V W
+W W
+Y Y
+Z Z 
+ZH X  
diff --git a/egs/aidatatang_200zh/s5/conf/decode.config b/egs/aidatatang_200zh/s5/conf/decode.config
@@ -0,0 +1,5 @@
+beam=11.0 # beam for decoding.  Was 13.0 in the scripts.
+first_beam=8.0 # beam for 1st-pass decoding in SAT.
+
+
+
diff --git a/egs/aidatatang_200zh/s5/conf/mfcc.conf b/egs/aidatatang_200zh/s5/conf/mfcc.conf
@@ -0,0 +1,2 @@
+--use-energy=false   # only non-default option.
+--sample-frequency=16000
diff --git a/egs/aidatatang_200zh/s5/conf/mfcc_hires.conf b/egs/aidatatang_200zh/s5/conf/mfcc_hires.conf
@@ -0,0 +1,10 @@
+# config for high-resolution MFCC features, intended for neural network training.
+# Note: we keep all cepstra, so it has the same info as filterbank features,
+# but MFCC is more easily compressible (because less correlated) which is why
+# we prefer this method.
+--use-energy=false   # use average of log energy, not energy.
+--sample-frequency=16000 #  Switchboard is sampled at 8kHz
+--num-mel-bins=40     # similar to Google's setup.
+--num-ceps=40     # there is no dimensionality reduction.
+--low-freq=40    # low cutoff frequency for mel bins
+--high-freq=-200 # high cutoff frequently, relative to Nyquist of 8000 (=3800)
diff --git a/egs/aidatatang_200zh/s5/conf/online_cmvn.conf b/egs/aidatatang_200zh/s5/conf/online_cmvn.conf
@@ -0,0 +1 @@
+# configuration file for apply-cmvn-online, used when invoking online2-wav-nnet3-latgen-faster.
diff --git a/egs/aidatatang_200zh/s5/conf/online_pitch.conf b/egs/aidatatang_200zh/s5/conf/online_pitch.conf
@@ -0,0 +1,4 @@
+--sample-frequency=16000
+--simulate-first-pass-online=true
+--normalization-right-context=25
+--frames-per-chunk=10
diff --git a/egs/aidatatang_200zh/s5/conf/pinyin2cmu b/egs/aidatatang_200zh/s5/conf/pinyin2cmu
@@ -0,0 +1,58 @@
+A AA
+AI AY
+AN AE N 
+ANG AE NG
+AO AW   
+B B 
+CH CH
+C T S
+D D
+E ER 
+EI EY
+EN AH N
+ENG AH NG
+ER AA R 
+F F
+G G
+H HH
+IA IY AA
+IANG IY AE NG
+IAN IY AE N
+IAO IY AW
+IE IY EH
+I IY
+ING IY NG
+IN IY N
+IONG IY UH NG
+IU IY UH 
+J J
+K K
+L L
+M M
+N N
+O AO
+ONG UH NG
+OU OW
+P P
+Q Q
+R R
+SH SH
+S S
+T T
+UAI UW AY
+UANG UW AE NG
+UAN UW AE N
+UA UW AA
+UI UW IY 
+UN UW AH N
+UO UW AO
+U UW
+UE IY EH 
+VE IY EH 
+V IY UW
+VN IY N 
+W W
+X X 
+Y Y
+ZH JH 
+Z Z
diff --git a/egs/aidatatang_200zh/s5/conf/pinyin_initial b/egs/aidatatang_200zh/s5/conf/pinyin_initial
@@ -0,0 +1,23 @@
+B 
+C
+CH
+D
+F
+G
+H
+J
+K
+L
+M
+N
+P
+Q
+R
+S
+SH
+T
+W
+X
+Y
+Z
+ZH
diff --git a/egs/aidatatang_200zh/s5/conf/pitch.conf b/egs/aidatatang_200zh/s5/conf/pitch.conf
@@ -0,0 +1 @@
+--sample-frequency=16000
-Original file line number
+Diff line change
@@ -0,0 +1,39 @@
+    AA A
+    AE A
+    AH A
+    AO UO
+    AW U
+    AY AI
+    B B
+    CH CH
+    D D
+    DH S I
+    EH AI
+    ER E
+    EY AI
+    F F
+    G G
+    HH H
+    IH I
+    IY I
+    JH ZH
+    K K
+    L L
+    M M
+    N N
+    NG N
+    OW UO
+    OY UO
+    P P
+    R R
+    S S
+    SH SH
+    T T
+    TH S
+    UH U
+    UW U
+    V W
+    W W
+    Y Y
+    Z Z
+    ZH X
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,5 @@
		beam=11.0 # beam for decoding. Was 13.0 in the scripts.
		first_beam=8.0 # beam for 1st-pass decoding in SAT.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		--use-energy=false # only non-default option.
		--sample-frequency=16000
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		# configuration file for apply-cmvn-online, used when invoking online2-wav-nnet3-latgen-faster.