diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
new file mode 100644
index 00000000000..660c62884be
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,18 @@
+---
+name: Bug report
+about: Create a report to help us improve
+title: ''
+labels: bug
+assignees: ''
+
+---
+
+
diff --git a/.github/ISSUE_TEMPLATE/feature-proposal-discussion.md b/.github/ISSUE_TEMPLATE/feature-proposal-discussion.md
new file mode 100644
index 00000000000..61e797b9ca1
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/feature-proposal-discussion.md
@@ -0,0 +1,18 @@
+---
+name: Feature proposal or discussion
+about: Suggest an idea for Kaldi
+title: ''
+labels: discussion
+assignees: ''
+
+---
+
+
diff --git a/.gitignore b/.gitignore
index 910d5cb019d..5764bfe22c6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -83,6 +83,7 @@ GSYMS
/tools/ATLAS/
/tools/atlas3.8.3.tar.gz
/tools/irstlm/
+/tools/mitlm/
/tools/openfst
/tools/openfst-1.3.2.tar.gz
/tools/openfst-1.3.2/
@@ -147,3 +148,4 @@ GSYMS
/tools/cub-1.8.0.zip
/tools/cub-1.8.0/
/tools/cub
+/tools/python/
diff --git a/docker/README.md b/docker/README.md
new file mode 100644
index 00000000000..852e9531bd6
--- /dev/null
+++ b/docker/README.md
@@ -0,0 +1,30 @@
+# Kaldi Docker images
+
+Kaldi offers two set of images: CPU-based images and GPU-based images. Daily builds of the latest version of the master branch (both CPU and GPU images) are pushed daily to [DockerHub](https://hub.docker.com/r/kaldiasr/kaldi).
+
+## Using pre-built images
+Sample usage of the CPU based images:
+```bash
+docker run -it kaldiasr/kaldi:latest bash
+```
+
+Sample usage of the GPU based images:
+
+Note: use [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) to run the GPU images.
+
+```bash
+docker run -it --runtime=nvidia kaldiasr/kaldi:gpu-latest bash
+```
+
+## Building images locally
+For building the CPU-based image:
+```bash
+cd docker/debian9.8-cpu
+docker build --tag kaldiasr/kaldi:latest .
+```
+
+and for GPU-based image:
+```bash
+cd docker/ubuntu16.04-gpu
+docker build --tag kaldiasr/kaldi:gpu-latest .
+```
diff --git a/docker/debian9.8-cpu/Dockerfile b/docker/debian9.8-cpu/Dockerfile
new file mode 100644
index 00000000000..fb2ef6e8db6
--- /dev/null
+++ b/docker/debian9.8-cpu/Dockerfile
@@ -0,0 +1,40 @@
+
+FROM debian:9.8
+LABEL maintainer="mdoulaty@gmail.com"
+
+RUN apt-get update && \
+ apt-get install -y --no-install-recommends \
+ g++ \
+ make \
+ automake \
+ autoconf \
+ bzip2 \
+ unzip \
+ wget \
+ sox \
+ libtool \
+ git \
+ subversion \
+ python2.7 \
+ python3 \
+ zlib1g-dev \
+ ca-certificates \
+ patch \
+ ffmpeg \
+ vim && \
+ rm -rf /var/lib/apt/lists/*
+
+RUN ln -s /usr/bin/python2.7 /usr/bin/python
+
+RUN git clone --depth 1 https://github.com/kaldi-asr/kaldi.git /opt/kaldi && \
+ cd /opt/kaldi && \
+ cd /opt/kaldi/tools && \
+ ./extras/install_mkl.sh && \
+ make -j $(nproc) && \
+ cd /opt/kaldi/src && \
+ ./configure --shared && \
+ make depend -j $(nproc) && \
+ make -j $(nproc)
+
+WORKDIR /opt/kaldi/
+
diff --git a/docker/ubuntu16.04-gpu/Dockerfile b/docker/ubuntu16.04-gpu/Dockerfile
new file mode 100644
index 00000000000..49189b2970f
--- /dev/null
+++ b/docker/ubuntu16.04-gpu/Dockerfile
@@ -0,0 +1,40 @@
+
+FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04
+LABEL maintainer="mdoulaty@gmail.com"
+
+RUN apt-get update && \
+ apt-get install -y --no-install-recommends \
+ g++ \
+ make \
+ automake \
+ autoconf \
+ bzip2 \
+ unzip \
+ wget \
+ sox \
+ libtool \
+ git \
+ subversion \
+ python2.7 \
+ python3 \
+ zlib1g-dev \
+ ca-certificates \
+ patch \
+ ffmpeg \
+ vim && \
+ rm -rf /var/lib/apt/lists/*
+
+RUN ln -s /usr/bin/python2.7 /usr/bin/python
+
+RUN git clone --depth 1 https://github.com/kaldi-asr/kaldi.git /opt/kaldi && \
+ cd /opt/kaldi && \
+ cd /opt/kaldi/tools && \
+ ./extras/install_mkl.sh && \
+ make -j $(nproc) && \
+ cd /opt/kaldi/src && \
+ ./configure --shared --use-cuda && \
+ make depend -j $(nproc) && \
+ make -j $(nproc)
+
+WORKDIR /opt/kaldi/
+
diff --git a/egs/aidatatang_200zh/README.md b/egs/aidatatang_200zh/README.md
new file mode 100644
index 00000000000..097454d84ce
--- /dev/null
+++ b/egs/aidatatang_200zh/README.md
@@ -0,0 +1,21 @@
+Aidatatang_200zh is a free Chinese Mandarin speech corpus provided by Beijing DataTang Technology Co., Ltd under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License.
+
+**About the aidatatang_200zh corpus:**
+
+- The corpus contains 200 hours of acoustic data, which is mostly mobile recorded data.
+- 600 speakers from different accent areas in China are invited to participate in the recording.
+- The transcription accuracy for each sentence is larger than 98%.
+- Recordings are conducted in a quiet indoor environment.
+- The database is divided into training set, validation set, and testing set in a ratio of 7: 1: 2.
+- Detail information such as speech data coding and speaker information is preserved in the metadata file.
+- Segmented transcripts are also provided.
+
+You can get the corpus from [here](https://www.datatang.com/webfront/opensource.html).
+
+DataTang is a community of creators-of world-changers and future-builders. We're invested in collaborating with a diverse set of voices in the AI world, and are excited about working on large-scale projects. Beyond speech, we're providing multiple resources in image, and text. For more details, please visit [datatang]().
+
+**About the recipe:**
+
+To demonstrate that this corpus is a reasonable data resource for Chinese Mandarin speech recognition research, a baseline recipe is provided here for everyone to explore their own systems easily and quickly.
+
+In this directory, each subdirectory contains the scripts for a sequence of experiments. The recipe in subdirectory "s5" is based on the hkust s5 recipe and aishell s5 recipe. It generates an integrated phonetic lexicon with CMU dictionary and cedit dictionary. This recipe follows the Mono+Triphone+SAT+fMLLR+DNN pipeline. In addition, this directory will be extended as scripts for speaker diarization and so on are created.
diff --git a/egs/aidatatang_200zh/s5/RESULTS b/egs/aidatatang_200zh/s5/RESULTS
new file mode 100644
index 00000000000..8c458e8015e
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/RESULTS
@@ -0,0 +1,17 @@
+%WER 37.09 [ 173936 / 468933, 4868 ins, 31143 del, 137925 sub ] exp/mono/decode_test/cer_10_0.0
+%WER 17.98 [ 84305 / 468933, 4724 ins, 12637 del, 66944 sub ] exp/tri1/decode_test/cer_13_0.0
+%WER 17.94 [ 84149 / 468933, 5025 ins, 12427 del, 66697 sub ] exp/tri2/decode_test/cer_13_0.0
+%WER 17.26 [ 80945 / 468933, 4421 ins, 12958 del, 63566 sub ] exp/tri3a/decode_test/cer_14_0.0
+%WER 14.16 [ 66424 / 468933, 4567 ins, 10224 del, 51633 sub ] exp/tri4a/decode_test/cer_14_0.0
+%WER 12.22 [ 57304 / 468933, 4799 ins, 8197 del, 44308 sub ] exp/tri5a/decode_test/cer_14_0.0
+%WER 5.59 [ 26232 / 468933, 1701 ins, 4377 del, 20154 sub ] exp/chain/tdnn_1a_sp/decode_test/cer_10_0.0
+
+# nnet3 tdnn with online pitch, local/nnet3/tuning/run_tdnn_2a.sh
+%WER 7.21 [ 33797 / 468933, 2141 ins, 6117 del, 25539 sub ] exp/nnet3/tdnn_sp/decode_test/cer_13_0.0
+%WER 7.44 [ 34878 / 468933, 2252 ins, 5854 del, 26772 sub ] exp/nnet3/tdnn_sp_online/decode_test/cer_12_0.0
+%WER 7.79 [ 36542 / 468933, 2527 ins, 5674 del, 28341 sub ] exp/nnet3/tdnn_sp_online/decode_test_per_utt/cer_12_0.0
+
+# chain with online pitch, local/chain/tuning/run_tdnn_2a.sh
+%WER 5.61 [ 26311 / 468933, 1773 ins, 4789 del, 19749 sub ] exp/chain/tdnn_2a_sp/decode_test/cer_11_0.0
+%WER 5.69 [ 26661 / 468933, 1723 ins, 4724 del, 20214 sub ] exp/chain/tdnn_2a_sp_online/decode_test/cer_11_0.0
+%WER 5.98 [ 28046 / 468933, 2031 ins, 4527 del, 21488 sub ] exp/chain/tdnn_2a_sp_online/decode_test_per_utt/cer_11_0.0
diff --git a/egs/aidatatang_200zh/s5/cmd.sh b/egs/aidatatang_200zh/s5/cmd.sh
new file mode 100644
index 00000000000..811adcde474
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/cmd.sh
@@ -0,0 +1,14 @@
+# you can change cmd.sh depending on what type of queue you are using.
+# If you have no queueing system and want to run on a local machine, you
+# can change all instances 'queue.pl' to run.pl (but be careful and run
+# commands one by one: most recipes will exhaust the memory on your
+# machine). queue.pl works with GridEngine (qsub). slurm.pl works
+# with slurm. Different queues are configured differently, with different
+# queue names and different ways of specifying things like memory;
+# to account for these differences you can create and edit the file
+# conf/queue.conf to match your queue's configuration. Search for
+# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information,
+# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl.
+
+export train_cmd="queue.pl --mem 2G"
+export decode_cmd="queue.pl --mem 4G"
diff --git a/egs/aidatatang_200zh/s5/conf/cmu2pinyin b/egs/aidatatang_200zh/s5/conf/cmu2pinyin
new file mode 100644
index 00000000000..c02eb600fcc
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/conf/cmu2pinyin
@@ -0,0 +1,39 @@
+AA A
+AE A
+AH A
+AO UO
+AW U
+AY AI
+B B
+CH CH
+D D
+DH S I
+EH AI
+ER E
+EY AI
+F F
+G G
+HH H
+IH I
+IY I
+JH ZH
+K K
+L L
+M M
+N N
+NG N
+OW UO
+OY UO
+P P
+R R
+S S
+SH SH
+T T
+TH S
+UH U
+UW U
+V W
+W W
+Y Y
+Z Z
+ZH X
diff --git a/egs/aidatatang_200zh/s5/conf/decode.config b/egs/aidatatang_200zh/s5/conf/decode.config
new file mode 100644
index 00000000000..d91f86183af
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/conf/decode.config
@@ -0,0 +1,5 @@
+beam=11.0 # beam for decoding. Was 13.0 in the scripts.
+first_beam=8.0 # beam for 1st-pass decoding in SAT.
+
+
+
diff --git a/egs/aidatatang_200zh/s5/conf/mfcc.conf b/egs/aidatatang_200zh/s5/conf/mfcc.conf
new file mode 100644
index 00000000000..a1aa3d6c158
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/conf/mfcc.conf
@@ -0,0 +1,2 @@
+--use-energy=false # only non-default option.
+--sample-frequency=16000
diff --git a/egs/aidatatang_200zh/s5/conf/mfcc_hires.conf b/egs/aidatatang_200zh/s5/conf/mfcc_hires.conf
new file mode 100644
index 00000000000..ca067e77b37
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/conf/mfcc_hires.conf
@@ -0,0 +1,10 @@
+# config for high-resolution MFCC features, intended for neural network training.
+# Note: we keep all cepstra, so it has the same info as filterbank features,
+# but MFCC is more easily compressible (because less correlated) which is why
+# we prefer this method.
+--use-energy=false # use average of log energy, not energy.
+--sample-frequency=16000 # Switchboard is sampled at 8kHz
+--num-mel-bins=40 # similar to Google's setup.
+--num-ceps=40 # there is no dimensionality reduction.
+--low-freq=40 # low cutoff frequency for mel bins
+--high-freq=-200 # high cutoff frequently, relative to Nyquist of 8000 (=3800)
diff --git a/egs/aidatatang_200zh/s5/conf/online_cmvn.conf b/egs/aidatatang_200zh/s5/conf/online_cmvn.conf
new file mode 100644
index 00000000000..591367e7ae9
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/conf/online_cmvn.conf
@@ -0,0 +1 @@
+# configuration file for apply-cmvn-online, used when invoking online2-wav-nnet3-latgen-faster.
diff --git a/egs/aidatatang_200zh/s5/conf/online_pitch.conf b/egs/aidatatang_200zh/s5/conf/online_pitch.conf
new file mode 100644
index 00000000000..c0f1342160d
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/conf/online_pitch.conf
@@ -0,0 +1,4 @@
+--sample-frequency=16000
+--simulate-first-pass-online=true
+--normalization-right-context=25
+--frames-per-chunk=10
diff --git a/egs/aidatatang_200zh/s5/conf/pinyin2cmu b/egs/aidatatang_200zh/s5/conf/pinyin2cmu
new file mode 100644
index 00000000000..a6e53620479
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/conf/pinyin2cmu
@@ -0,0 +1,58 @@
+A AA
+AI AY
+AN AE N
+ANG AE NG
+AO AW
+B B
+CH CH
+C T S
+D D
+E ER
+EI EY
+EN AH N
+ENG AH NG
+ER AA R
+F F
+G G
+H HH
+IA IY AA
+IANG IY AE NG
+IAN IY AE N
+IAO IY AW
+IE IY EH
+I IY
+ING IY NG
+IN IY N
+IONG IY UH NG
+IU IY UH
+J J
+K K
+L L
+M M
+N N
+O AO
+ONG UH NG
+OU OW
+P P
+Q Q
+R R
+SH SH
+S S
+T T
+UAI UW AY
+UANG UW AE NG
+UAN UW AE N
+UA UW AA
+UI UW IY
+UN UW AH N
+UO UW AO
+U UW
+UE IY EH
+VE IY EH
+V IY UW
+VN IY N
+W W
+X X
+Y Y
+ZH JH
+Z Z
diff --git a/egs/aidatatang_200zh/s5/conf/pinyin_initial b/egs/aidatatang_200zh/s5/conf/pinyin_initial
new file mode 100644
index 00000000000..e263ad07e2a
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/conf/pinyin_initial
@@ -0,0 +1,23 @@
+B
+C
+CH
+D
+F
+G
+H
+J
+K
+L
+M
+N
+P
+Q
+R
+S
+SH
+T
+W
+X
+Y
+Z
+ZH
diff --git a/egs/aidatatang_200zh/s5/conf/pitch.conf b/egs/aidatatang_200zh/s5/conf/pitch.conf
new file mode 100644
index 00000000000..e959a19d5b8
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/conf/pitch.conf
@@ -0,0 +1 @@
+--sample-frequency=16000
diff --git a/egs/aidatatang_200zh/s5/local/chain/compare_wer.sh b/egs/aidatatang_200zh/s5/local/chain/compare_wer.sh
new file mode 100755
index 00000000000..71e6fbe106d
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/chain/compare_wer.sh
@@ -0,0 +1,97 @@
+#!/bin/bash
+# Copyright 2018 Emotech LTD (Author: Xuechen Liu)
+
+# compare wer between diff. models in aidatatang_200zh chain directory
+# exemplar usage: local/chain/compare_wer.sh --online exp/chain/tdnn_2a_sp
+# note: this script is made quite general since we kinda wanna give more flexibility to
+# users on adding affix for their own use when training models.
+
+set -e
+. ./cmd.sh
+. ./path.sh
+
+if [ $# == 0 ]; then
+ echo "Usage: $0: [--online] [ ... ]"
+ echo "e.g.: $0 --online exp/chain/tdnn_2a_sp"
+ exit 1
+fi
+
+echo "# $0 $*"
+
+include_online=false
+if [ "$1" == "--online" ]; then
+ include_online=true
+ shift
+fi
+
+set_names() {
+ if [ $# != 1 ]; then
+ echo "compare_wer.sh: internal error"
+ exit 1 # exit the program
+ fi
+ dirname=$(echo $1 | cut -d: -f1)
+}
+
+# print model names
+echo -n "# Model "
+for x in $*; do
+ printf "% 10s" " $(basename $x)"
+done
+echo
+
+# print decode WER results
+echo -n "# WER(%) "
+for x in $*; do
+ set_names $x
+ wer=$([ -d $x ] && grep WER $x/decode_test/cer_* | utils/best_wer.sh | awk '{print $2}')
+ printf "% 10s" $wer
+done
+echo
+
+# so how about online WER?
+if $include_online; then
+ echo -n "# WER(%)[online] "
+ for x in $*; do
+ set_names $x
+ wer=$(cat ${x}_online/decode_test/cer_* | utils/best_wer.sh | awk '{print $2}')
+ printf "% 10s" $wer
+ done
+ echo
+ echo -n "# WER(%)[per-utt] "
+ for x in $*; do
+ set_names $x
+ wer_per_utt=$(cat ${x}_online/decode_test_per_utt/cer_* | utils/best_wer.sh | awk '{print $2}')
+ printf "% 10s" $wer_per_utt
+ done
+ echo
+fi
+
+# print final log prob for train & validation
+echo -n "# Final train prob "
+for x in $*; do
+ prob=$(grep Overall $x/log/compute_prob_train.final.log | grep -v xent | awk '{printf($8)}' | cut -c1-7)
+ printf "% 10s" $prob
+done
+echo
+
+echo -n "# Final valid prob "
+for x in $*; do
+ prob=$(grep Overall $x/log/compute_prob_valid.final.log | grep -v xent | awk '{printf($8)}' | cut -c1-7)
+ printf "% 10s" $prob
+done
+echo
+
+# do the same for xent objective
+echo -n "# Final train prob (xent)"
+for x in $*; do
+ prob=$(grep Overall $x/log/compute_prob_train.final.log | grep -w xent | awk '{printf("%.4f", $8)}')
+ printf "% 10s" $prob
+done
+echo
+
+echo -n "# Final valid prob (xent)"
+for x in $*; do
+ prob=$(grep Overall $x/log/compute_prob_valid.final.log | grep -w xent | awk '{printf("%.4f", $8)}')
+ printf "% 10s" $prob
+done
+echo
diff --git a/egs/aidatatang_200zh/s5/local/chain/run_tdnn.sh b/egs/aidatatang_200zh/s5/local/chain/run_tdnn.sh
new file mode 120000
index 00000000000..34499362831
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/chain/run_tdnn.sh
@@ -0,0 +1 @@
+tuning/run_tdnn_1a.sh
\ No newline at end of file
diff --git a/egs/aidatatang_200zh/s5/local/chain/tuning/run_tdnn_1a.sh b/egs/aidatatang_200zh/s5/local/chain/tuning/run_tdnn_1a.sh
new file mode 100644
index 00000000000..0be0e2c79c6
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/chain/tuning/run_tdnn_1a.sh
@@ -0,0 +1,193 @@
+#!/bin/bash
+
+# This script is based on run_tdnn_7h.sh in swbd chain recipe.
+
+# results
+# local/chain/compare_wer.sh exp/chain/tdnn_1a_sp/
+# Model tdnn_1a_sp
+# WER(%) 5.59
+# Final train prob -0.0488
+# Final valid prob -0.0925
+# Final train prob (xent) -0.8001
+# Final valid prob (xent) -1.0398
+
+set -e
+
+# configs for 'chain'
+affix=
+stage=0
+train_stage=-10
+get_egs_stage=-10
+dir=exp/chain/tdnn_1a # Note: _sp will get added to this
+decode_iter=
+
+# training options
+num_epochs=4
+initial_effective_lrate=0.001
+final_effective_lrate=0.0001
+max_param_change=2.0
+final_layer_normalize_target=0.5
+num_jobs_initial=2
+num_jobs_final=12
+minibatch_size=128
+frames_per_eg=150,110,90
+remove_egs=true
+common_egs_dir=
+xent_regularize=0.1
+
+# End configuration section.
+echo "$0 $@" # Print the command line for logging
+
+. ./cmd.sh
+. ./path.sh
+. ./utils/parse_options.sh
+
+if ! cuda-compiled; then
+ cat <$lang/topo
+fi
+
+if [ $stage -le 9 ]; then
+ # Build a tree using our new topology. This is the critically different
+ # step compared with other recipes.
+ steps/nnet3/chain/build_tree.sh --frame-subsampling-factor 3 \
+ --context-opts "--context-width=2 --central-position=1" \
+ --cmd "$train_cmd" 5000 data/$train_set $lang $ali_dir $treedir
+fi
+
+if [ $stage -le 10 ]; then
+ echo "$0: creating neural net configs using the xconfig parser";
+
+ num_targets=$(tree-info $treedir/tree |grep num-pdfs|awk '{print $2}')
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
+
+ mkdir -p $dir/configs
+ cat < $dir/configs/network.xconfig
+ input dim=100 name=ivector
+ input dim=43 name=input
+
+ # please note that it is important to have input layer with the name=input
+ # as the layer immediately preceding the fixed-affine-layer to enable
+ # the use of short notation for the descriptor
+ fixed-affine-layer name=lda input=Append(-1,0,1,ReplaceIndex(ivector, t, 0)) affine-transform-file=$dir/configs/lda.mat
+
+ # the first splicing is moved before the lda layer, so no splicing here
+ relu-batchnorm-layer name=tdnn1 dim=625
+ relu-batchnorm-layer name=tdnn2 input=Append(-1,0,1) dim=625
+ relu-batchnorm-layer name=tdnn3 input=Append(-1,0,1) dim=625
+ relu-batchnorm-layer name=tdnn4 input=Append(-3,0,3) dim=625
+ relu-batchnorm-layer name=tdnn5 input=Append(-3,0,3) dim=625
+ relu-batchnorm-layer name=tdnn6 input=Append(-3,0,3) dim=625
+
+ ## adding the layers for chain branch
+ relu-batchnorm-layer name=prefinal-chain input=tdnn6 dim=625 target-rms=0.5
+ output-layer name=output include-log-softmax=false dim=$num_targets max-change=1.5
+
+ # adding the layers for xent branch
+ # This block prints the configs for a separate output that will be
+ # trained with a cross-entropy objective in the 'chain' models... this
+ # has the effect of regularizing the hidden parts of the model. we use
+ # 0.5 / args.xent_regularize as the learning rate factor- the factor of
+ # 0.5 / args.xent_regularize is suitable as it means the xent
+ # final-layer learns at a rate independent of the regularization
+ # constant; and the 0.5 was tuned so as to make the relative progress
+ # similar in the xent and regular final layers.
+ relu-batchnorm-layer name=prefinal-xent input=tdnn6 dim=625 target-rms=0.5
+ output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor max-change=1.5
+
+EOF
+ steps/nnet3/xconfig_to_configs.py --xconfig-file $dir/configs/network.xconfig --config-dir $dir/configs/
+fi
+
+if [ $stage -le 11 ]; then
+ if [[ $(hostname -f) == *.clsp.jhu.edu ]] && [ ! -d $dir/egs/storage ]; then
+ utils/create_split_dir.pl \
+ /export/b0{5,6,7,8}/$USER/kaldi-data/egs/aidatatang-$(date +'%m_%d_%H_%M')/s5c/$dir/egs/storage $dir/egs/storage
+ fi
+
+ steps/nnet3/chain/train.py --stage $train_stage \
+ --cmd "$decode_cmd" \
+ --feat.online-ivector-dir exp/nnet3/ivectors_${train_set} \
+ --feat.cmvn-opts "--norm-means=false --norm-vars=false" \
+ --chain.xent-regularize $xent_regularize \
+ --chain.leaky-hmm-coefficient 0.1 \
+ --chain.l2-regularize 0.00005 \
+ --chain.apply-deriv-weights false \
+ --chain.lm-opts="--num-extra-lm-states=2000" \
+ --egs.dir "$common_egs_dir" \
+ --egs.stage $get_egs_stage \
+ --egs.opts "--frames-overlap-per-eg 0" \
+ --egs.chunk-width $frames_per_eg \
+ --trainer.num-chunk-per-minibatch $minibatch_size \
+ --trainer.frames-per-iter 1500000 \
+ --trainer.num-epochs $num_epochs \
+ --trainer.optimization.num-jobs-initial $num_jobs_initial \
+ --trainer.optimization.num-jobs-final $num_jobs_final \
+ --trainer.optimization.initial-effective-lrate $initial_effective_lrate \
+ --trainer.optimization.final-effective-lrate $final_effective_lrate \
+ --trainer.max-param-change $max_param_change \
+ --cleanup.remove-egs $remove_egs \
+ --feat-dir data/${train_set}_hires \
+ --tree-dir $treedir \
+ --lat-dir exp/tri5a_sp_lats \
+ --dir $dir || exit 1;
+fi
+
+if [ $stage -le 12 ]; then
+ # Note: it might appear that this $lang directory is mismatched, and it is as
+ # far as the 'topo' is concerned, but this script doesn't read the 'topo' from
+ # the lang directory.
+ utils/mkgraph.sh --self-loop-scale 1.0 data/lang_test $dir $dir/graph
+fi
+
+graph_dir=$dir/graph
+if [ $stage -le 13 ]; then
+ for test_set in dev test; do
+ steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 \
+ --nj 10 --cmd "$decode_cmd" \
+ --online-ivector-dir exp/nnet3/ivectors_$test_set \
+ $graph_dir data/${test_set}_hires $dir/decode_${test_set} || exit 1;
+ done
+fi
+
+exit;
diff --git a/egs/aidatatang_200zh/s5/local/chain/tuning/run_tdnn_2a.sh b/egs/aidatatang_200zh/s5/local/chain/tuning/run_tdnn_2a.sh
new file mode 100644
index 00000000000..78dd4000e58
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/chain/tuning/run_tdnn_2a.sh
@@ -0,0 +1,238 @@
+#!/bin/bash
+
+# This script is based on run_tdnn_1a.sh.
+# This setup used online pitch to train the neural network.
+# It requires a online_pitch.conf in the conf dir.
+
+# results
+# local/chain/compare_wer.sh exp/chain/tdnn_2a_sp
+# Model tdnn_2a_sp
+# WER(%) 5.61
+# Final train prob -0.0502
+# Final valid prob -0.0913
+# Final train prob (xent) -0.8047
+# Final valid prob (xent) -1.0292
+
+# local/chain/compare_wer.sh --online exp/chain/tdnn_2a_sp
+# Model tdnn_2a_sp
+# WER(%) 5.61
+# WER(%)[online] 5.69
+# WER(%)[per-utt] 5.98
+# Final train prob -0.0502
+# Final valid prob -0.0913
+# Final train prob (xent) -0.8047
+# Final valid prob (xent) -1.0292
+
+# local/chain/compare_wer.sh exp/chain/tdnn_1a_sp exp/chain/tdnn_2a_sp
+# Model tdnn_1a_sp tdnn_2a_sp
+# WER(%) 5.59 5.61
+# Final train prob -0.0488 -0.0502
+# Final valid prob -0.0925 -0.0913
+# Final train prob (xent) -0.8001 -0.8047
+# Final valid prob (xent) -1.0398 -1.0292
+
+set -e
+
+# configs for 'chain'
+affix=
+stage=0
+train_stage=-10
+get_egs_stage=-10
+dir=exp/chain/tdnn_2a # Note: _sp will get added to this
+decode_iter=
+
+# training options
+num_epochs=4
+initial_effective_lrate=0.001
+final_effective_lrate=0.0001
+max_param_change=2.0
+final_layer_normalize_target=0.5
+num_jobs_initial=2
+num_jobs_final=12
+minibatch_size=128
+frames_per_eg=150,110,90
+remove_egs=true
+common_egs_dir=
+xent_regularize=0.1
+
+# End configuration section.
+echo "$0 $@" # Print the command line for logging
+
+. ./cmd.sh
+. ./path.sh
+. ./utils/parse_options.sh
+
+if ! cuda-compiled; then
+ cat <$lang/topo
+fi
+
+if [ $stage -le 9 ]; then
+ # Build a tree using our new topology. This is the critically different
+ # step compared with other recipes.
+ steps/nnet3/chain/build_tree.sh --frame-subsampling-factor 3 \
+ --context-opts "--context-width=2 --central-position=1" \
+ --cmd "$train_cmd" 5000 data/$train_set $lang $ali_dir $treedir
+fi
+
+if [ $stage -le 10 ]; then
+ echo "$0: creating neural net configs using the xconfig parser";
+
+ num_targets=$(tree-info $treedir/tree |grep num-pdfs|awk '{print $2}')
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
+
+ mkdir -p $dir/configs
+ cat < $dir/configs/network.xconfig
+ input dim=100 name=ivector
+ input dim=43 name=input
+
+ # please note that it is important to have input layer with the name=input
+ # as the layer immediately preceding the fixed-affine-layer to enable
+ # the use of short notation for the descriptor
+ fixed-affine-layer name=lda input=Append(-1,0,1,ReplaceIndex(ivector, t, 0)) affine-transform-file=$dir/configs/lda.mat
+
+ # the first splicing is moved before the lda layer, so no splicing here
+ relu-batchnorm-layer name=tdnn1 dim=625
+ relu-batchnorm-layer name=tdnn2 input=Append(-1,0,1) dim=625
+ relu-batchnorm-layer name=tdnn3 input=Append(-1,0,1) dim=625
+ relu-batchnorm-layer name=tdnn4 input=Append(-3,0,3) dim=625
+ relu-batchnorm-layer name=tdnn5 input=Append(-3,0,3) dim=625
+ relu-batchnorm-layer name=tdnn6 input=Append(-3,0,3) dim=625
+
+ ## adding the layers for chain branch
+ relu-batchnorm-layer name=prefinal-chain input=tdnn6 dim=625 target-rms=0.5
+ output-layer name=output include-log-softmax=false dim=$num_targets max-change=1.5
+
+ # adding the layers for xent branch
+ # This block prints the configs for a separate output that will be
+ # trained with a cross-entropy objective in the 'chain' models... this
+ # has the effect of regularizing the hidden parts of the model. we use
+ # 0.5 / args.xent_regularize as the learning rate factor- the factor of
+ # 0.5 / args.xent_regularize is suitable as it means the xent
+ # final-layer learns at a rate independent of the regularization
+ # constant; and the 0.5 was tuned so as to make the relative progress
+ # similar in the xent and regular final layers.
+ relu-batchnorm-layer name=prefinal-xent input=tdnn6 dim=625 target-rms=0.5
+ output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor max-change=1.5
+
+EOF
+ steps/nnet3/xconfig_to_configs.py --xconfig-file $dir/configs/network.xconfig --config-dir $dir/configs/
+fi
+
+if [ $stage -le 11 ]; then
+ if [[ $(hostname -f) == *.clsp.jhu.edu ]] && [ ! -d $dir/egs/storage ]; then
+ utils/create_split_dir.pl \
+ /export/b0{5,6,7,8}/$USER/kaldi-data/egs/aidatatang-$(date +'%m_%d_%H_%M')/s5c/$dir/egs/storage $dir/egs/storage
+ fi
+
+ steps/nnet3/chain/train.py --stage $train_stage \
+ --cmd "$decode_cmd" \
+ --feat.online-ivector-dir exp/nnet3/ivectors_${train_set} \
+ --feat.cmvn-opts "--norm-means=false --norm-vars=false" \
+ --chain.xent-regularize $xent_regularize \
+ --chain.leaky-hmm-coefficient 0.1 \
+ --chain.l2-regularize 0.00005 \
+ --chain.apply-deriv-weights false \
+ --chain.lm-opts="--num-extra-lm-states=2000" \
+ --egs.dir "$common_egs_dir" \
+ --egs.stage $get_egs_stage \
+ --egs.opts "--frames-overlap-per-eg 0" \
+ --egs.chunk-width $frames_per_eg \
+ --trainer.num-chunk-per-minibatch $minibatch_size \
+ --trainer.frames-per-iter 1500000 \
+ --trainer.num-epochs $num_epochs \
+ --trainer.optimization.num-jobs-initial $num_jobs_initial \
+ --trainer.optimization.num-jobs-final $num_jobs_final \
+ --trainer.optimization.initial-effective-lrate $initial_effective_lrate \
+ --trainer.optimization.final-effective-lrate $final_effective_lrate \
+ --trainer.max-param-change $max_param_change \
+ --cleanup.remove-egs $remove_egs \
+ --feat-dir data/${train_set}_hires_online \
+ --tree-dir $treedir \
+ --lat-dir exp/tri5a_sp_lats \
+ --dir $dir || exit 1;
+fi
+
+if [ $stage -le 12 ]; then
+ # Note: it might appear that this $lang directory is mismatched, and it is as
+ # far as the 'topo' is concerned, but this script doesn't read the 'topo' from
+ # the lang directory.
+ utils/mkgraph.sh --self-loop-scale 1.0 data/lang_test $dir $dir/graph
+fi
+
+graph_dir=$dir/graph
+if [ $stage -le 13 ]; then
+ for test_set in dev test; do
+ steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 \
+ --nj 10 --cmd "$decode_cmd" \
+ --online-ivector-dir exp/nnet3/ivectors_$test_set \
+ $graph_dir data/${test_set}_hires_online $dir/decode_${test_set} || exit 1;
+ done
+fi
+
+if [ $stage -le 14 ]; then
+ steps/online/nnet3/prepare_online_decoding.sh --mfcc-config conf/mfcc_hires.conf \
+ --add-pitch true \
+ $lang exp/nnet3/extractor "$dir" ${dir}_online || exit 1;
+fi
+
+dir=${dir}_online
+if [ $stage -le 15 ]; then
+ for test_set in dev test; do
+ steps/online/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 \
+ --nj 10 --cmd "$decode_cmd" \
+ --config conf/decode.config \
+ $graph_dir data/${test_set}_hires_online $dir/decode_${test_set} || exit 1;
+ done
+fi
+
+if [ $stage -le 16 ]; then
+ for test_set in dev test; do
+ steps/online/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 \
+ --nj 10 --cmd "$decode_cmd" --per-utt true \
+ --config conf/decode.config \
+ $graph_dir data/${test_set}_hires_online $dir/decode_${test_set}_per_utt || exit 1;
+ done
+fi
+
+exit;
diff --git a/egs/aidatatang_200zh/s5/local/create_oov_char_lexicon.pl b/egs/aidatatang_200zh/s5/local/create_oov_char_lexicon.pl
new file mode 100644
index 00000000000..33e2e8061c3
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/create_oov_char_lexicon.pl
@@ -0,0 +1,48 @@
+#!/usr/bin/env perl
+# Copyright 2016 Alibaba Robotics Corp. (Author: Xingyu Na)
+#
+# A script for char-based Chinese OOV lexicon generation.
+#
+# Input 1: char-based dictionary, example
+# CHAR1 ph1 ph2
+# CHAR2 ph3
+# CHAR3 ph2 ph4
+#
+# Input 2: OOV word list, example
+# WORD1
+# WORD2
+# WORD3
+#
+# where WORD1 is in the format of "CHAR1CHAR2".
+#
+# Output: OOV lexicon, in the format of normal lexicon
+
+if($#ARGV != 1) {
+ print STDERR "usage: perl create_oov_char_lexicon.pl chardict oovwordlist > oovlex\n\n";
+ print STDERR "### chardict: a dict in which each line contains the pronunciation of one Chinese char\n";
+ print STDERR "### oovwordlist: OOV word list\n";
+ print STDERR "### oovlex: output OOV lexicon\n";
+ exit;
+}
+
+use utf8;
+my %prons;
+open(DICT, $ARGV[0]) || die("Can't open dict ".$ARGV[0]."\n");
+binmode(DICT,":encoding(utf8)");
+foreach () {
+ chomp; @A = split(" ", $_); $prons{$A[0]} = $A[1];
+}
+close DICT;
+
+open(WORDS, $ARGV[1]) || die("Can't open oov word list ".$ARGV[1]."\n");
+binmode(WORDS,":encoding(utf8)");
+while () {
+ chomp;
+ print $_;
+ @A = split("", $_);
+ foreach (@A) {
+ print " $prons{$_}";
+ }
+ print "\n";
+}
+close WORDS;
diff --git a/egs/aidatatang_200zh/s5/local/data_prep.sh b/egs/aidatatang_200zh/s5/local/data_prep.sh
new file mode 100644
index 00000000000..bb278a7d904
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/data_prep.sh
@@ -0,0 +1,68 @@
+#!/bin/bash
+
+# Copyright 2017 Xingyu Na
+# Apache 2.0
+
+. ./path.sh || exit 1;
+
+if [ $# != 2 ]; then
+ echo "Usage: $0 "
+ echo " $0 /export/a05/xna/data/data_aidatatang_200zh/corpus /export/a05/xna/data/data_aidatatang_200zh/transcript"
+ exit 1;
+fi
+
+aidatatang_audio_dir=$1
+aidatatang_text=$2/aidatatang_200_zh_transcript.txt
+
+train_dir=data/local/train
+dev_dir=data/local/dev
+test_dir=data/local/test
+tmp_dir=data/local/tmp
+
+mkdir -p $train_dir
+mkdir -p $dev_dir
+mkdir -p $test_dir
+mkdir -p $tmp_dir
+
+# data directory check
+if [ ! -d $aidatatang_audio_dir ] || [ ! -f $aidatatang_text ]; then
+ echo "Error: $0 requires two directory arguments"
+ exit 1;
+fi
+
+# find wav audio file for train, dev and test resp.
+find $aidatatang_audio_dir -iname "*.wav" > $tmp_dir/wav.flist
+n=`cat $tmp_dir/wav.flist | wc -l`
+[ $n -ne 237265 ] && \
+ echo Warning: expected 237265 data files, found $n
+
+grep -i "corpus/train" $tmp_dir/wav.flist > $train_dir/wav.flist || exit 1;
+grep -i "corpus/dev" $tmp_dir/wav.flist > $dev_dir/wav.flist || exit 1;
+grep -i "corpus/test" $tmp_dir/wav.flist > $test_dir/wav.flist || exit 1;
+
+rm -r $tmp_dir
+
+# Transcriptions preparation
+for dir in $train_dir $dev_dir $test_dir; do
+ echo Preparing $dir transcriptions
+ sed -e 's/\.wav//' $dir/wav.flist | awk -F '/' '{print $NF}' > $dir/utt.list
+ sed -e 's/\.wav//' $dir/wav.flist | awk -F '/' '{i=NF-1;printf("%s %s\n",$NF,$i)}' > $dir/utt2spk_all
+ paste -d' ' $dir/utt.list $dir/wav.flist > $dir/wav.scp_all
+ utils/filter_scp.pl -f 1 $dir/utt.list $aidatatang_text > $dir/transcripts.txt
+ awk '{print $1}' $dir/transcripts.txt > $dir/utt.list
+ utils/filter_scp.pl -f 1 $dir/utt.list $dir/utt2spk_all | sort -u > $dir/utt2spk
+ utils/filter_scp.pl -f 1 $dir/utt.list $dir/wav.scp_all | sort -u > $dir/wav.scp
+ sort -u $dir/transcripts.txt > $dir/text
+ utils/utt2spk_to_spk2utt.pl $dir/utt2spk > $dir/spk2utt
+done
+
+mkdir -p data/train data/dev data/test
+
+for f in spk2utt utt2spk wav.scp text; do
+ cp $train_dir/$f data/train/$f || exit 1;
+ cp $dev_dir/$f data/dev/$f || exit 1;
+ cp $test_dir/$f data/test/$f || exit 1;
+done
+
+echo "$0: aidatatang_200zh data preparation succeeded"
+exit 0;
diff --git a/egs/aidatatang_200zh/s5/local/download_and_untar.sh b/egs/aidatatang_200zh/s5/local/download_and_untar.sh
new file mode 100644
index 00000000000..39f9ac01ff7
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/download_and_untar.sh
@@ -0,0 +1,110 @@
+#!/bin/bash
+
+# Copyright 2014 Johns Hopkins University (author: Daniel Povey)
+# 2017 Xingyu Na
+# Apache 2.0
+
+remove_archive=false
+
+if [ "$1" == --remove-archive ]; then
+ remove_archive=true
+ shift
+fi
+
+if [ $# -ne 3 ]; then
+ echo "Usage: $0 [--remove-archive] "
+ echo "e.g.: $0 /export/a05/xna/data www.openslr.org/resources/62 aidatatang_200zh"
+ echo "With --remove-archive it will remove the archive after successfully un-tarring it."
+ echo " can be one of: aidatatang_200zh."
+fi
+
+data=$1
+url=$2
+part=$3
+
+if [ ! -d "$data" ]; then
+ echo "$0: no such directory $data"
+ exit 1;
+fi
+
+part_ok=false
+list="aidatatang_200zh"
+for x in $list; do
+ if [ "$part" == $x ]; then part_ok=true; fi
+done
+if ! $part_ok; then
+ echo "$0: expected to be one of $list, but got '$part'"
+ exit 1;
+fi
+
+if [ -z "$url" ]; then
+ echo "$0: empty URL base."
+ exit 1;
+fi
+
+if [ -f $data/$part/.complete ]; then
+ echo "$0: data part $part was already successfully extracted, nothing to do."
+ exit 0;
+fi
+
+# sizes of the archive files in bytes.
+sizes="18756983399"
+
+if [ -f $data/$part.tgz ]; then
+ size=$(/bin/ls -l $data/$part.tgz | awk '{print $5}')
+ size_ok=false
+ for s in $sizes; do if [ $s == $size ]; then size_ok=true; fi; done
+ if ! $size_ok; then
+ echo "$0: removing existing file $data/$part.tgz because its size in bytes $size"
+ echo "does not equal the size of one of the archives."
+ rm $data/$part.gz
+ else
+ echo "$data/$part.tgz exists and appears to be complete."
+ fi
+fi
+
+if [ ! -f $data/$part.tgz ]; then
+ if ! which wget >/dev/null; then
+ echo "$0: wget is not installed."
+ exit 1;
+ fi
+ full_url=$url/$part.tgz
+ echo "$0: downloading data from $full_url. This may take some time, please be patient."
+
+ cd $data
+ if ! wget --no-check-certificate $full_url; then
+ echo "$0: error executing wget $full_url"
+ exit 1;
+ fi
+fi
+
+cd $data
+
+if ! tar -xvzf $part.tgz; then
+ echo "$0: error un-tarring archive $data/$part.tgz"
+ exit 1;
+fi
+
+touch $data/$part/.complete
+
+dev_dir=$data/$part/corpus/dev
+test_dir=$data/$part/corpus/test
+train_dir=$data/$part/corpus/train
+if [ $part == "aidatatang_200zh" ]; then
+ for set in $dev_dir $test_dir $train_dir;do
+ cd $set
+ for wav in ./*.tar.gz; do
+ echo "Extracting wav from $wav"
+ tar -zxf $wav && rm $wav
+ done
+ done
+fi
+
+echo "$0: Successfully downloaded and un-tarred $data/$part.tgz"
+
+if $remove_archive; then
+ echo "$0: removing $data/$part.tgz file since --remove-archive option was supplied."
+ rm $data/$part.tgz
+fi
+
+exit 0;
diff --git a/egs/gale_arabic/s5b/local/gale_format_data.sh b/egs/aidatatang_200zh/s5/local/format_data.sh
old mode 100755
new mode 100644
similarity index 73%
rename from egs/gale_arabic/s5b/local/gale_format_data.sh
rename to egs/aidatatang_200zh/s5/local/format_data.sh
index b69c34e68b9..47af9dd9dfd
--- a/egs/gale_arabic/s5b/local/gale_format_data.sh
+++ b/egs/aidatatang_200zh/s5/local/format_data.sh
@@ -1,23 +1,25 @@
#!/bin/bash
+#
-# Copyright 2014 QCRI (author: Ahmed Ali)
-# Apache 2.0
+. ./path.sh
-if [ -f path.sh ]; then
- . ./path.sh; else
- echo "$0: missing path.sh"; exit 1;
-fi
+silprob=0.5
+mkdir -p data/lang_test data/train data/dev
-for dir in test train; do
- cp -pr data/local/$dir data/$dir
-done
-
-
-mkdir -p data/lang_test
arpa_lm=data/local/lm/3gram-mincount/lm_unpruned.gz
[ ! -f $arpa_lm ] && echo No such file $arpa_lm && exit 1;
+# Copy stuff into its final locations...
+
+for f in spk2utt utt2spk wav.scp text; do
+ cp data/local/train/$f data/train/$f || exit 1;
+done
+
+for f in spk2utt utt2spk wav.scp text; do
+ cp data/local/dev/$f data/dev/$f || exit 1;
+done
+
rm -r data/lang_test
cp -r data/lang data/lang_test
@@ -26,15 +28,15 @@ gunzip -c "$arpa_lm" | \
--read-symbol-table=data/lang_test/words.txt - data/lang_test/G.fst
-echo "$0: Checking how stochastic G is (the first of these numbers should be small):"
+echo "Checking how stochastic G is (the first of these numbers should be small):"
fstisstochastic data/lang_test/G.fst
## Check lexicon.
## just have a look and make sure it seems sane.
-echo "$0: First few lines of lexicon FST:"
+echo "First few lines of lexicon FST:"
fstprint --isymbols=data/lang/phones.txt --osymbols=data/lang/words.txt data/lang/L.fst | head
-echo "$0: Performing further checks"
+echo Performing further checks
# Checking that G.fst is determinizable.
fstdeterminize data/lang_test/G.fst /dev/null || echo Error determinizing G.
@@ -55,6 +57,4 @@ fsttablecompose data/lang/L_disambig.fst data/lang_test/G.fst | \
fstisstochastic || echo LG is not stochastic
-echo gale_format_data succeeded.
-
-exit 0
+echo format_data succeeded.
diff --git a/egs/aidatatang_200zh/s5/local/nnet3/compare_wer.sh b/egs/aidatatang_200zh/s5/local/nnet3/compare_wer.sh
new file mode 100755
index 00000000000..2d85626c356
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/nnet3/compare_wer.sh
@@ -0,0 +1,82 @@
+#!/bin/bash
+# Copyright 2018 Emotech LTD (Author: Xuechen Liu)
+
+# compare wer between diff. models in aidatatang_200zh nnet3 directory
+# exemplar usage: local/nnet3/compare_wer.sh exp/nnet3/tdnn_sp
+# note: this script is made quite general since we kinda wanna give more flexibility to
+# users on adding affix for their own use when training models.
+
+set -e
+. ./cmd.sh
+. ./path.sh
+
+if [ $# == 0 ]; then
+ echo "Usage: $0: [--online] [ ... ]"
+ echo "e.g.: $0 exp/nnet3/tdnn_sp exp/nnet3/tdnn_sp_pr"
+ exit 1
+fi
+
+echo "# $0 $*"
+
+include_online=false
+if [ "$1" == "--online" ]; then
+ include_online=true
+ shift
+fi
+
+set_names() {
+ if [ $# != 1 ]; then
+ echo "compare_wer.sh: internal error"
+ exit 1 # exit the program
+ fi
+ dirname=$(echo $1 | cut -d: -f1)
+}
+
+# print model names
+echo -n "# Model "
+for x in $*; do
+ printf "% 10s" " $(basename $x)"
+done
+echo
+
+# print decode WER results
+echo -n "# WER(%) "
+for x in $*; do
+ set_names $x
+ wer=$([ -d $x ] && grep WER $x/decode_test/cer_* | utils/best_wer.sh | awk '{print $2}')
+ printf "% 10s" $wer
+done
+echo
+
+# so how about online WER?
+if $include_online; then
+ echo -n "# WER(%)[online] "
+ for x in $*; do
+ set_names $x
+ wer=$(cat ${x}_online/decode_test/cer_* | utils/best_wer.sh | awk '{print $2}')
+ printf "% 10s" $wer
+ done
+ echo
+ echo -n "# WER(%)[per-utt] "
+ for x in $*; do
+ set_names $x
+ wer_per_utt=$(cat ${x}_online/decode_test_per_utt/cer_* | utils/best_wer.sh | awk '{print $2}')
+ printf "% 10s" $wer_per_utt
+ done
+ echo
+fi
+
+# print log for train & validation
+echo -n "# Final train prob "
+for x in $*; do
+ prob=$(grep Overall $x/log/compute_prob_train.combined.log | grep log-like | awk '{printf($8)}' | cut -c1-7)
+ printf "% 10s" $prob
+done
+echo
+
+echo -n "# Final valid prob "
+for x in $*; do
+ prob=$(grep Overall $x/log/compute_prob_valid.combined.log | grep log-like | awk '{printf($8)}' | cut -c1-7)
+ printf "% 10s" $prob
+done
+echo
diff --git a/egs/aidatatang_200zh/s5/local/nnet3/run_ivector_common.sh b/egs/aidatatang_200zh/s5/local/nnet3/run_ivector_common.sh
new file mode 100644
index 00000000000..0fe55ecf000
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/nnet3/run_ivector_common.sh
@@ -0,0 +1,160 @@
+#!/bin/bash
+
+set -euo pipefail
+
+# This script is modified based on mini_librispeech/s5/local/nnet3/run_ivector_common.sh
+
+# This script is called from local/nnet3/run_tdnn.sh and
+# local/chain/run_tdnn.sh (and may eventually be called by more
+# scripts). It contains the common feature preparation and
+# iVector-related parts of the script. See those scripts for examples
+# of usage.
+
+stage=0
+train_set=train
+test_sets="dev test"
+gmm=tri5a
+online=false
+nnet3_affix=
+
+. ./cmd.sh
+. ./path.sh
+. utils/parse_options.sh
+
+gmm_dir=exp/${gmm}
+ali_dir=exp/${gmm}_sp_ali
+
+for f in data/${train_set}/feats.scp ${gmm_dir}/final.mdl; do
+ if [ ! -f $f ]; then
+ echo "$0: expected file $f to exist"
+ exit 1
+ fi
+done
+
+online_affix=
+if [ $online = true ]; then
+ online_affix=_online
+fi
+
+if [ $stage -le 1 ]; then
+ # Although the nnet will be trained by high resolution data, we still have to
+ # perturb the normal data to get the alignment _sp stands for speed-perturbed
+ echo "$0: preparing directory for low-resolution speed-perturbed data (for alignment)"
+ utils/data/perturb_data_dir_speed_3way.sh data/${train_set} data/${train_set}_sp
+ echo "$0: making MFCC features for low-resolution speed-perturbed data"
+ steps/make_mfcc_pitch.sh --cmd "$train_cmd" --nj 70 data/${train_set}_sp \
+ exp/make_mfcc/train_sp mfcc_perturbed || exit 1;
+ steps/compute_cmvn_stats.sh data/${train_set}_sp \
+ exp/make_mfcc/train_sp mfcc_perturbed || exit 1;
+ utils/fix_data_dir.sh data/${train_set}_sp
+fi
+
+if [ $stage -le 2 ]; then
+ echo "$0: aligning with the perturbed low-resolution data"
+ steps/align_fmllr.sh --nj 30 --cmd "$train_cmd" \
+ data/${train_set}_sp data/lang $gmm_dir $ali_dir || exit 1
+fi
+
+if [ $stage -le 3 ]; then
+ # Create high-resolution MFCC features (with 40 cepstra instead of 13).
+ # this shows how you can split across multiple file-systems.
+ echo "$0: creating high-resolution MFCC features"
+ mfccdir=mfcc_perturbed_hires$online_affix
+ if [[ $(hostname -f) == *.clsp.jhu.edu ]] && [ ! -d $mfccdir/storage ]; then
+ utils/create_split_dir.pl /export/b0{5,6,7,8}/$USER/kaldi-data/mfcc/aidatatang-$(date +'%m_%d_%H_%M')/s5/$mfccdir/storage $mfccdir/storage
+ fi
+
+ for datadir in ${train_set}_sp ${test_sets}; do
+ utils/copy_data_dir.sh data/$datadir data/${datadir}_hires$online_affix
+ done
+
+ # do volume-perturbation on the training data prior to extracting hires
+ # features; this helps make trained nnets more invariant to test data volume.
+ utils/data/perturb_data_dir_volume.sh data/${train_set}_sp_hires$online_affix || exit 1;
+
+ for datadir in ${train_set}_sp ${test_sets}; do
+ steps/make_mfcc_pitch$online_affix.sh --nj 10 --mfcc-config conf/mfcc_hires.conf \
+ --cmd "$train_cmd" data/${datadir}_hires$online_affix exp/make_hires/$datadir $mfccdir || exit 1;
+ steps/compute_cmvn_stats.sh data/${datadir}_hires$online_affix exp/make_hires/$datadir $mfccdir || exit 1;
+ utils/fix_data_dir.sh data/${datadir}_hires$online_affix || exit 1;
+ # create MFCC data dir without pitch to extract iVector
+ utils/data/limit_feature_dim.sh 0:39 data/${datadir}_hires$online_affix data/${datadir}_hires_nopitch || exit 1;
+ steps/compute_cmvn_stats.sh data/${datadir}_hires_nopitch exp/make_hires/$datadir $mfccdir || exit 1;
+ done
+fi
+
+if [ $stage -le 4 ]; then
+ echo "$0: computing a subset of data to train the diagonal UBM."
+ # We'll use about a quarter of the data.
+ mkdir -p exp/nnet3${nnet3_affix}/diag_ubm
+ temp_data_root=exp/nnet3${nnet3_affix}/diag_ubm
+
+ num_utts_total=$(wc -l $dir/configs/network.xconfig
+ input dim=$ivector_dim name=ivector
+ input dim=$feat_dim name=input
+
+ # please note that it is important to have input layer with the name=input
+ # as the layer immediately preceding the fixed-affine-layer to enable
+ # the use of short notation for the descriptor
+ fixed-affine-layer name=lda input=Append(-2,-1,0,1,2,ReplaceIndex(ivector, t, 0)) affine-transform-file=$dir/configs/lda.mat
+
+ # the first splicing is moved before the lda layer, so no splicing here
+ relu-batchnorm-layer name=tdnn1 dim=850
+ relu-batchnorm-layer name=tdnn2 dim=850 input=Append(-1,0,2)
+ relu-batchnorm-layer name=tdnn3 dim=850 input=Append(-3,0,3)
+ relu-batchnorm-layer name=tdnn4 dim=850 input=Append(-7,0,2)
+ relu-batchnorm-layer name=tdnn5 dim=850 input=Append(-3,0,3)
+ relu-batchnorm-layer name=tdnn6 dim=850
+ output-layer name=output input=tdnn6 dim=$num_targets max-change=1.5
+EOF
+ steps/nnet3/xconfig_to_configs.py --xconfig-file $dir/configs/network.xconfig --config-dir $dir/configs/
+fi
+
+if [ $stage -le 8 ]; then
+ if [[ $(hostname -f) == *.clsp.jhu.edu ]] && [ ! -d $dir/egs/storage ]; then
+ utils/create_split_dir.pl \
+ /export/b0{5,6,7,8}/$USER/kaldi-data/egs/aidatatang-$(date +'%m_%d_%H_%M')/s5/$dir/egs/storage $dir/egs/storage
+ fi
+
+ steps/nnet3/train_dnn.py --stage=$train_stage \
+ --cmd="$decode_cmd" \
+ --feat.online-ivector-dir exp/nnet3/ivectors_${train_set} \
+ --feat.cmvn-opts="--norm-means=false --norm-vars=false" \
+ --trainer.num-epochs $num_epochs \
+ --trainer.optimization.num-jobs-initial $num_jobs_initial \
+ --trainer.optimization.num-jobs-final $num_jobs_final \
+ --trainer.optimization.initial-effective-lrate $initial_effective_lrate \
+ --trainer.optimization.final-effective-lrate $final_effective_lrate \
+ --egs.dir "$common_egs_dir" \
+ --cleanup.remove-egs $remove_egs \
+ --cleanup.preserve-model-interval 500 \
+ --feat-dir=data/${train_set}_hires_online \
+ --ali-dir $ali_dir \
+ --lang data/lang \
+ --reporting.email="$reporting_email" \
+ --dir=$dir || exit 1;
+fi
+
+if [ $stage -le 9 ]; then
+ # this version of the decoding treats each utterance separately
+ # without carrying forward speaker information.
+ for decode_set in dev test; do
+ num_jobs=`cat data/${decode_set}_hires_online/utt2spk|cut -d' ' -f2|sort -u|wc -l`
+ decode_dir=${dir}/decode_$decode_set
+ steps/nnet3/decode.sh --nj $num_jobs --cmd "$decode_cmd" \
+ --online-ivector-dir exp/nnet3/ivectors_${decode_set} \
+ $graph_dir data/${decode_set}_hires_online $decode_dir || exit 1;
+ done
+fi
+
+if [ $stage -le 10 ]; then
+ steps/online/nnet3/prepare_online_decoding.sh --mfcc-config conf/mfcc_hires.conf \
+ --add-pitch true \
+ data/lang exp/nnet3/extractor "$dir" ${dir}_online || exit 1;
+fi
+
+if [ $stage -le 11 ]; then
+ # do the actual online decoding with iVectors, carrying info forward from
+ # previous utterances of the same speaker.
+ for decode_set in dev test; do
+ num_jobs=`cat data/${decode_set}_hires_online/utt2spk|cut -d' ' -f2|sort -u|wc -l`
+ decode_dir=${dir}_online/decode_$decode_set
+ steps/online/nnet3/decode.sh --nj $num_jobs --cmd "$decode_cmd" \
+ --config conf/decode.config \
+ $graph_dir data/${decode_set}_hires_online $decode_dir || exit 1;
+ done
+fi
+
+if [ $stage -le 12 ]; then
+ # this version of the decoding treats each utterance separately
+ # without carrying forward speaker information.
+ for decode_set in dev test; do
+ num_jobs=`cat data/${decode_set}_hires_online/utt2spk|cut -d' ' -f2|sort -u|wc -l`
+ decode_dir=${dir}_online/decode_${decode_set}_per_utt
+ steps/online/nnet3/decode.sh --nj $num_jobs --cmd "$decode_cmd" \
+ --config conf/decode.config --per-utt true \
+ $graph_dir data/${decode_set}_hires_online $decode_dir || exit 1;
+ done
+fi
+
+wait;
+exit 0;
diff --git a/egs/aidatatang_200zh/s5/local/prepare_dict.sh b/egs/aidatatang_200zh/s5/local/prepare_dict.sh
new file mode 100644
index 00000000000..aa72bcd48d2
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/prepare_dict.sh
@@ -0,0 +1,320 @@
+#!/bin/bash
+#Copyright 2016 LeSpeech (Author: Xingyu Na)
+
+# prepare dictionary for aidatatang
+# it is done for English and Chinese separately,
+# For English, we use CMU dictionary, and Sequitur G2P
+# for OOVs, while all englist phone set will concert to Chinese
+# phone set at the end. For Chinese, we use an online dictionary,
+# for OOV, we just produce pronunciation using Charactrt Mapping.
+
+. ./path.sh
+
+[ $# != 0 ] && echo "Usage: $0" && exit 1;
+
+train_dir=data/local/train
+dev_dir=data/local/dev
+test_dir=data/local/test
+dict_dir=data/local/dict
+mkdir -p $dict_dir
+mkdir -p $dict_dir/lexicon-{en,ch}
+
+# extract full vocabulary
+cat $train_dir/text $dev_dir/text $test_dir/text | awk '{for (i = 2; i <= NF; i++) print $i}' |\
+ perl -ape 's/ /\n/g;' | sort -u | grep -v '\[LAUGHTER\]' | grep -v '\[NOISE\]' |\
+ grep -v '\[VOCALIZED-NOISE\]' > $dict_dir/words.txt || exit 1;
+
+# split into English and Chinese
+cat $dict_dir/words.txt | grep '[a-zA-Z]' > $dict_dir/lexicon-en/words-en.txt || exit 1;
+cat $dict_dir/words.txt | grep -v '[a-zA-Z]' > $dict_dir/lexicon-ch/words-ch.txt || exit 1;
+
+
+##### produce pronunciations for english
+if [ ! -f $dict_dir/cmudict/cmudict.0.7a ]; then
+ echo "--- Downloading CMU dictionary ..."
+ svn co -r 13068 https://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict \
+ $dict_dir/cmudict || exit 1;
+fi
+
+# format cmudict
+echo "--- Striping stress and pronunciation variant markers from cmudict ..."
+perl $dict_dir/cmudict/scripts/make_baseform.pl \
+ $dict_dir/cmudict/cmudict.0.7a /dev/stdout |\
+ sed -e 's:^\([^\s(]\+\)([0-9]\+)\(\s\+\)\(.*\):\1\2\3:' > $dict_dir/cmudict/cmudict-plain.txt || exit 1;
+
+# extract in-vocab lexicon and oov words
+echo "--- Searching for English OOV words ..."
+awk 'NR==FNR{words[$1]; next;} !($1 in words)' \
+ $dict_dir/cmudict/cmudict-plain.txt $dict_dir/lexicon-en/words-en.txt |\
+ egrep -v '<.?s>' > $dict_dir/lexicon-en/words-en-oov.txt || exit 1;
+
+awk 'NR==FNR{words[$1]; next;} ($1 in words)' \
+ $dict_dir/lexicon-en/words-en.txt $dict_dir/cmudict/cmudict-plain.txt |\
+ egrep -v '<.?s>' > $dict_dir/lexicon-en/lexicon-en-iv.txt || exit 1;
+
+wc -l $dict_dir/lexicon-en/words-en-oov.txt
+wc -l $dict_dir/lexicon-en/lexicon-en-iv.txt
+
+# setup g2p and generate oov lexicon
+if [ ! -f conf/g2p_model ]; then
+ echo "--- Downloading a pre-trained Sequitur G2P model ..."
+ wget http://sourceforge.net/projects/kaldi/files/sequitur-model4 -O conf/g2p_model
+ if [ ! -f conf/g2p_model ]; then
+ echo "Failed to download the g2p model!"
+ exit 1
+ fi
+fi
+
+echo "--- Preparing pronunciations for OOV words ..."
+g2p=`which g2p.py`
+if [ ! -x $g2p ]; then
+ echo "g2p.py is not found. Checkout tools/extras/install_sequitur.sh."
+ exit 1
+fi
+g2p.py --model=conf/g2p_model --apply $dict_dir/lexicon-en/words-en-oov.txt \
+ > $dict_dir/lexicon-en/lexicon-en-oov.txt || exit 1;
+
+# merge in-vocab and oov lexicon
+cat $dict_dir/lexicon-en/lexicon-en-oov.txt $dict_dir/lexicon-en/lexicon-en-iv.txt |\
+ sort > $dict_dir/lexicon-en/lexicon-en-phn.txt || exit 1;
+
+# convert cmu phoneme to pinyin phonenme
+mkdir -p $dict_dir/map
+cat conf/cmu2pinyin | awk '{print $1;}' | sort -u > $dict_dir/map/cmu || exit 1;
+cat conf/pinyin2cmu | awk -v cmu=$dict_dir/map/cmu \
+ 'BEGIN{while((getline $dict_dir/map/cmu-used || exit 1;
+cat $dict_dir/map/cmu | awk -v cmu=$dict_dir/map/cmu-used \
+ 'BEGIN{while((getline $dict_dir/map/cmu-not-used || exit 1;
+
+awk 'NR==FNR{words[$1]; next;} ($1 in words)' \
+ $dict_dir/map/cmu-not-used conf/cmu2pinyin |\
+ egrep -v '<.?s>' > $dict_dir/map/cmu-py || exit 1;
+
+cat $dict_dir/map/cmu-py | \
+ perl -e '
+ open(MAPS, $ARGV[0]) or die("could not open map file");
+ my %py2ph;
+ foreach $line () {
+ @A = split(" ", $line);
+ $py = shift(@A);
+ $py2ph{$py} = [@A];
+ }
+ my @entry;
+ while () {
+ @A = split(" ", $_);
+ @entry = ();
+ $W = shift(@A);
+ push(@entry, $W);
+ for($i = 0; $i < @A; $i++) { push(@entry, @{$py2ph{$A[$i]}}); }
+ print "@entry";
+ print "\n";
+ }
+' conf/pinyin2cmu > $dict_dir/map/cmu-cmu || exit 1;
+
+cat $dict_dir/lexicon-en/lexicon-en-phn.txt | \
+ perl -e '
+ open(MAPS, $ARGV[0]) or die("could not open map file");
+ my %py2ph;
+ foreach $line () {
+ @A = split(" ", $line);
+ $py = shift(@A);
+ $py2ph{$py} = [@A];
+ }
+ my @entry;
+ while () {
+ @A = split(" ", $_);
+ @entry = ();
+ $W = shift(@A);
+ push(@entry, $W);
+ for($i = 0; $i < @A; $i++) {
+ if (exists $py2ph{$A[$i]}) { push(@entry, @{$py2ph{$A[$i]}}); }
+ else {push(@entry, $A[$i])};
+ }
+ print "@entry";
+ print "\n";
+ }
+' $dict_dir/map/cmu-cmu > $dict_dir/lexicon-en/lexicon-en.txt || exit 1;
+
+
+##### produce pronunciations for chinese
+if [ ! -f $dict_dir/cedict/cedict_1_0_ts_utf-8_mdbg.txt ]; then
+ echo "------------- Downloading cedit dictionary ---------------"
+ mkdir -p $dict_dir/cedict
+ wget -P $dict_dir/cedict http://www.mdbg.net/chindict/export/cedict/cedict_1_0_ts_utf-8_mdbg.txt.gz
+ gunzip $dict_dir/cedict/cedict_1_0_ts_utf-8_mdbg.txt.gz
+fi
+
+cat $dict_dir/cedict/cedict_1_0_ts_utf-8_mdbg.txt | grep -v '#' | awk -F '/' '{print $1}' |\
+ perl -e '
+ while () {
+ @A = split(" ", $_);
+ print $A[1];
+ for($n = 2; $n < @A; $n++) {
+ $A[$n] =~ s:\[?([a-zA-Z0-9\:]+)\]?:$1:;
+ $tmp = uc($A[$n]);
+ print " $tmp";
+ }
+ print "\n";
+ }
+ ' | sort -k1 > $dict_dir/cedict/ch-dict.txt || exit 1;
+
+echo "--- Searching for Chinese OOV words ..."
+awk 'NR==FNR{words[$1]; next;} !($1 in words)' \
+ $dict_dir/cedict/ch-dict.txt $dict_dir/lexicon-ch/words-ch.txt |\
+ egrep -v '<.?s>' > $dict_dir/lexicon-ch/words-ch-oov.txt || exit 1;
+
+awk 'NR==FNR{words[$1]; next;} ($1 in words)' \
+ $dict_dir/lexicon-ch/words-ch.txt $dict_dir/cedict/ch-dict.txt |\
+ egrep -v '<.?s>' > $dict_dir/lexicon-ch/lexicon-ch-iv.txt || exit 1;
+
+wc -l $dict_dir/lexicon-ch/words-ch-oov.txt
+wc -l $dict_dir/lexicon-ch/lexicon-ch-iv.txt
+
+
+# validate Chinese dictionary and compose a char-based
+# dictionary in order to get OOV pronunciations
+cat $dict_dir/cedict/ch-dict.txt |\
+ perl -e '
+ use utf8;
+ binmode(STDIN,":encoding(utf8)");
+ binmode(STDOUT,":encoding(utf8)");
+ while () {
+ @A = split(" ", $_);
+ $word_len = length($A[0]);
+ $proun_len = @A - 1 ;
+ if ($word_len == $proun_len) {print $_;}
+ }
+ ' > $dict_dir/cedict/ch-dict-1.txt || exit 1;
+
+# extract chars
+cat $dict_dir/cedict/ch-dict-1.txt | awk '{print $1}' |\
+ perl -e '
+ use utf8;
+ binmode(STDIN,":encoding(utf8)");
+ binmode(STDOUT,":encoding(utf8)");
+ while () {
+ @A = split(" ", $_);
+ @chars = split("", $A[0]);
+ foreach (@chars) {
+ print "$_\n";
+ }
+ }
+ ' | grep -v '^$' > $dict_dir/lexicon-ch/ch-char.txt || exit 1;
+
+# extract individual pinyins
+cat $dict_dir/cedict/ch-dict-1.txt |\
+ awk '{for(i=2; i<=NF; i++) print $i}' |\
+ perl -ape 's/ /\n/g;' > $dict_dir/lexicon-ch/ch-char-pinyin.txt || exit 1;
+
+# first make sure number of characters and pinyins
+# are equal, so that a char-based dictionary can
+# be composed.
+nchars=`wc -l < $dict_dir/lexicon-ch/ch-char.txt`
+npinyin=`wc -l < $dict_dir/lexicon-ch/ch-char-pinyin.txt`
+if [ $nchars -ne $npinyin ]; then
+ echo "Found $nchars chars and $npinyin pinyin. Please check!"
+ exit 1
+fi
+
+paste $dict_dir/lexicon-ch/ch-char.txt $dict_dir/lexicon-ch/ch-char-pinyin.txt |\
+ sort -u > $dict_dir/lexicon-ch/ch-char-dict.txt || exit 1;
+
+# create a multiple pronunciation dictionary
+cat $dict_dir/lexicon-ch/ch-char-dict.txt |\
+ perl -e '
+ my $prev = "";
+ my $out_line = "";
+ while () {
+ @A = split(" ", $_);
+ $cur = $A[0];
+ $cur_py = $A[1];
+ #print length($prev);
+ if (length($prev) == 0) { $out_line = $_; chomp($out_line);}
+ if (length($prev)>0 && $cur ne $prev) { print $out_line; print "\n"; $out_line = $_; chomp($out_line);}
+ if (length($prev)>0 && $cur eq $prev) { $out_line = $out_line."/"."$cur_py";}
+ $prev = $cur;
+ }
+ print $out_line;
+ ' > $dict_dir/lexicon-ch/ch-char-dict-mp.txt || exit 1;
+
+# get lexicon for Chinese OOV words
+local/create_oov_char_lexicon.pl $dict_dir/lexicon-ch/ch-char-dict-mp.txt \
+ $dict_dir/lexicon-ch/words-ch-oov.txt > $dict_dir/lexicon-ch/lexicon-ch-oov.txt || exit 1;
+
+# seperate multiple prons for Chinese OOV lexicon
+cat $dict_dir/lexicon-ch/lexicon-ch-oov.txt |\
+ perl -e '
+ my @entry;
+ my @entry1;
+ while () {
+ @A = split(" ", $_);
+ @entry = ();
+ push(@entry, $A[0]);
+ for($i = 1; $i < @A; $i++ ) {
+ @py = split("/", $A[$i]);
+ @entry1 = @entry;
+ @entry = ();
+ for ($j = 0; $j < @entry1; $j++) {
+ for ($k = 0; $k < @py; $k++) {
+ $tmp = $entry1[$j]." ".$py[$k];
+ push(@entry, $tmp);
+ }
+ }
+ }
+ for ($i = 0; $i < @entry; $i++) {
+ print $entry[$i];
+ print "\n";
+ }
+ }
+ ' > $dict_dir/lexicon-ch/lexicon-ch-oov-mp.txt || exit 1;
+
+# compose IV and OOV lexicons for Chinese
+cat $dict_dir/lexicon-ch/lexicon-ch-oov-mp.txt $dict_dir/lexicon-ch/lexicon-ch-iv.txt |\
+ awk '{if (NF > 1 && $2 ~ /[A-Za-z0-9]+/) print $0;}' > $dict_dir/lexicon-ch/lexicon-ch.txt || exit 1;
+
+# convert Chinese pinyin to CMU format
+cat $dict_dir/lexicon-ch/lexicon-ch.txt | sed -e 's/U:/V/g' | sed -e 's/ R\([0-9]\)/ ER\1/g'|\
+ utils/pinyin_map.pl conf/pinyin2cmu > $dict_dir/lexicon-ch/lexicon-ch-cmu.txt || exit 1;
+
+# combine English and Chinese lexicons
+cat $dict_dir/lexicon-en/lexicon-en.txt $dict_dir/lexicon-ch/lexicon-ch-cmu.txt |\
+ sort -u > $dict_dir/lexicon1.txt || exit 1;
+
+cat $dict_dir/lexicon1.txt | awk '{ for(n=2;n<=NF;n++){ phones[$n] = 1; }} END{for (p in phones) print p;}'| \
+ sort -u |\
+ perl -e '
+ my %ph_cl;
+ while () {
+ $phone = $_;
+ chomp($phone);
+ chomp($_);
+ $phone =~ s:([A-Z]+)[0-9]:$1:;
+ if (exists $ph_cl{$phone}) { push(@{$ph_cl{$phone}}, $_) }
+ else { $ph_cl{$phone} = [$_]; }
+ }
+ foreach $key ( keys %ph_cl ) {
+ print "@{ $ph_cl{$key} }\n"
+ }
+ ' | sort -k1 > $dict_dir/nonsilence_phones.txt || exit 1;
+
+( echo SIL; echo SPN; echo NSN; echo LAU ) > $dict_dir/silence_phones.txt
+
+echo SIL > $dict_dir/optional_silence.txt
+
+# No "extra questions" in the input to this setup, as we don't
+# have stress or tone
+
+cat $dict_dir/silence_phones.txt| awk '{printf("%s ", $1);} END{printf "\n";}' > $dict_dir/extra_questions.txt || exit 1;
+cat $dict_dir/nonsilence_phones.txt | perl -e 'while(<>){ foreach $p (split(" ", $_)) {
+ $p =~ m:^([^\d]+)(\d*)$: || die "Bad phone $_"; $q{$2} .= "$p "; } } foreach $l (values %q) {print "$l\n";}' \
+ >> $dict_dir/extra_questions.txt || exit 1;
+
+# Add to the lexicon the silences, noises etc.
+(echo '!SIL SIL'; echo '[VOCALIZED-NOISE] SPN'; echo '[NOISE] NSN'; echo '[LAUGHTER] LAU';
+ echo ' SPN' ) | \
+ cat - $dict_dir/lexicon1.txt > $dict_dir/lexicon.txt || exit 1;
+
+echo "$0: aidatatang_200zh dict preparation succeeded"
+exit 0;
diff --git a/egs/aidatatang_200zh/s5/local/score.sh b/egs/aidatatang_200zh/s5/local/score.sh
new file mode 100644
index 00000000000..a9786169973
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/score.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+
+set -e -o pipefail
+set -x
+steps/score_kaldi.sh "$@"
+steps/scoring/score_kaldi_cer.sh --stage 2 "$@"
+
+echo "$0: Done"
diff --git a/egs/aidatatang_200zh/s5/local/train_lms.sh b/egs/aidatatang_200zh/s5/local/train_lms.sh
new file mode 100644
index 00000000000..bc52f8acb20
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/train_lms.sh
@@ -0,0 +1,92 @@
+#!/bin/bash
+
+
+# To be run from one directory above this script.
+
+
+text=data/local/train/text
+lexicon=data/local/dict/lexicon.txt
+
+for f in "$text" "$lexicon"; do
+ [ ! -f $x ] && echo "$0: No such file $f" && exit 1;
+done
+
+# This script takes no arguments. It assumes you have already run
+# aidatatang_data_prep.sh.
+# It takes as input the files
+#data/local/train/text
+#data/local/dict/lexicon.txt
+dir=data/local/lm
+mkdir -p $dir
+
+export LC_ALL=C # You'll get errors about things being not sorted, if you
+ # have a different locale.
+kaldi_lm=`which train_lm.sh`
+if [ ! -x $kaldi_lm ]; then
+ echo "$0: train_lm.sh is not found. That might mean it's not installed"
+ echo "$0: or it is not added to PATH"
+ echo "$0: Use the script tools/extra/install_kaldi_lm.sh to install it"
+ exit 1
+fi
+
+cleantext=$dir/text.no_oov
+
+cat $text | awk -v lex=$lexicon 'BEGIN{while((getline0){ seen[$1]=1; } }
+ {for(n=1; n<=NF;n++) { if (seen[$n]) { printf("%s ", $n); } else {printf(" ");} } printf("\n");}' \
+ > $cleantext || exit 1;
+
+
+cat $cleantext | awk '{for(n=2;n<=NF;n++) print $n; }' | sort | uniq -c | \
+ sort -nr > $dir/word.counts || exit 1;
+
+
+# Get counts from acoustic training transcripts, and add one-count
+# for each word in the lexicon (but not silence, we don't want it
+# in the LM-- we'll add it optionally later).
+cat $cleantext | awk '{for(n=2;n<=NF;n++) print $n; }' | \
+ cat - <(grep -w -v '!SIL' $lexicon | awk '{print $1}') | \
+ sort | uniq -c | sort -nr > $dir/unigram.counts || exit 1;
+
+# note: we probably won't really make use of as there aren't any OOVs
+cat $dir/unigram.counts | awk '{print $2}' | get_word_map.pl "" "" "" > $dir/word_map \
+ || exit 1;
+
+# note: ignore 1st field of train.txt, it's the utterance-id.
+cat $cleantext | awk -v wmap=$dir/word_map 'BEGIN{while((getline0)map[$1]=$2;}
+ { for(n=2;n<=NF;n++) { printf map[$n]; if(n$dir/train.gz \
+ || exit 1;
+
+train_lm.sh --arpa --lmtype 3gram-mincount $dir || exit 1;
+
+# LM is small enough that we don't need to prune it (only about 0.7M N-grams).
+# Perplexity over 128254.000000 words is 90.446690
+
+# note: output is
+# data/local/lm/3gram-mincount/lm_unpruned.gz
+
+exit 0
+
+
+# From here is some commands to do a baseline with SRILM (assuming
+# you have it installed).
+heldout_sent=10000 # Don't change this if you want result to be comparable with
+ # kaldi_lm results
+sdir=$dir/srilm # in case we want to use SRILM to double-check perplexities.
+mkdir -p $sdir
+cat $cleantext | awk '{for(n=2;n<=NF;n++){ printf $n; if(n $sdir/heldout
+cat $cleantext | awk '{for(n=2;n<=NF;n++){ printf $n; if(n $sdir/train
+
+cat $dir/word_map | awk '{print $1}' | cat - <(echo ""; echo "" ) > $sdir/wordlist
+
+
+ngram-count -text $sdir/train -order 3 -limit-vocab -vocab $sdir/wordlist -unk \
+ -map-unk "" -kndiscount -interpolate -lm $sdir/srilm.o3g.kn.gz
+ngram -lm $sdir/srilm.o3g.kn.gz -ppl $sdir/heldout
+# 0 zeroprobs, logprob= -250954 ppl= 90.5091 ppl1= 132.482
+
+# Note: perplexity SRILM gives to Kaldi-LM model is same as kaldi-lm reports above.
+# Difference in WSJ must have been due to different treatment of .
+ngram -lm $dir/3gram-mincount/lm_unpruned.gz -ppl $sdir/heldout
+# 0 zeroprobs, logprob= -250913 ppl= 90.4439 ppl1= 132.379
diff --git a/egs/aidatatang_200zh/s5/local/wer_hyp_filter b/egs/aidatatang_200zh/s5/local/wer_hyp_filter
new file mode 100644
index 00000000000..a1bfdb57efc
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/wer_hyp_filter
@@ -0,0 +1,19 @@
+#!/usr/bin/env perl
+
+@filters=('[NOISE]','[LAUGHTER]','[VOCALIZED-NOISE]','','%HESITATION');
+
+foreach $w (@filters) {
+ $bad{$w} = 1;
+}
+
+while() {
+ @A = split(" ", $_);
+ $id = shift @A;
+ print "$id ";
+ foreach $a (@A) {
+ if (!defined $bad{$a}) {
+ print "$a ";
+ }
+ }
+ print "\n";
+}
diff --git a/egs/aidatatang_200zh/s5/local/wer_output_filter b/egs/aidatatang_200zh/s5/local/wer_output_filter
new file mode 100644
index 00000000000..aceeeec41b4
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/wer_output_filter
@@ -0,0 +1,25 @@
+#!/usr/bin/env perl
+# Copyright 2012-2014 Johns Hopkins University (Author: Yenda Trmal)
+# Apache 2.0
+use utf8;
+
+use open qw(:encoding(utf8));
+binmode STDIN, ":utf8";
+binmode STDOUT, ":utf8";
+binmode STDERR, ":utf8";
+
+while (<>) {
+ @F = split " ";
+ print $F[0] . " ";
+ foreach $s (@F[1..$#F]) {
+ if (($s =~ /\[.*\]/) || ($s =~ /\<.*\>/) || ($s =~ "!SIL")) {
+ print "";
+ } else {
+ print "$s"
+ }
+ print " ";
+ }
+ print "\n";
+}
+
+
diff --git a/egs/aidatatang_200zh/s5/local/wer_ref_filter b/egs/aidatatang_200zh/s5/local/wer_ref_filter
new file mode 100644
index 00000000000..a1bfdb57efc
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/local/wer_ref_filter
@@ -0,0 +1,19 @@
+#!/usr/bin/env perl
+
+@filters=('[NOISE]','[LAUGHTER]','[VOCALIZED-NOISE]','','%HESITATION');
+
+foreach $w (@filters) {
+ $bad{$w} = 1;
+}
+
+while() {
+ @A = split(" ", $_);
+ $id = shift @A;
+ print "$id ";
+ foreach $a (@A) {
+ if (!defined $bad{$a}) {
+ print "$a ";
+ }
+ }
+ print "\n";
+}
diff --git a/egs/aidatatang_200zh/s5/path.sh b/egs/aidatatang_200zh/s5/path.sh
new file mode 100644
index 00000000000..2d17b17a84a
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/path.sh
@@ -0,0 +1,6 @@
+export KALDI_ROOT=`pwd`/../../..
+[ -f $KALDI_ROOT/tools/env.sh ] && . $KALDI_ROOT/tools/env.sh
+export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PWD:$PATH
+[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/tools/config/common_path.sh is not present -> Exit!" && exit 1
+. $KALDI_ROOT/tools/config/common_path.sh
+export LC_ALL=C
diff --git a/egs/aidatatang_200zh/s5/run.sh b/egs/aidatatang_200zh/s5/run.sh
new file mode 100644
index 00000000000..47e46a660cd
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/run.sh
@@ -0,0 +1,150 @@
+#!/bin/bash
+
+# Copyright 2019 Beijing DataTang Tech. Co. Ltd. (Author: Liyuan Wang)
+# 2017 Hui Bu
+# 2017 Jiayu Du
+# 2017 Xingyu Na
+# 2017 Bengu Wu
+# 2017 Hao Zheng
+# Apache 2.0
+
+# This is a shell script, but it's recommended that you run the commands one by
+# one by copying and pasting into the shell.
+# Caution: some of the graph creation steps use quite a bit of memory, so you
+# should run this on a machine that has sufficient memory.
+
+
+. ./cmd.sh ## You'll want to change cmd.sh to something that will work on your system.
+ ## This relates to the queue.
+. ./path.sh
+
+
+# corpus directory and download URL
+data=/export/a05/xna/data
+data_url=www.openslr.org/resources/62
+
+# Obtain the database
+#[ -d $data ] || mkdir -p $data || exit 1;
+local/download_and_untar.sh $data $data_url aidatatang_200zh || exit 1;
+
+# Data Preparation: generate text, wav.scp, utt2spk, spk2utt
+local/data_prep.sh $data/aidatatang_200zh/corpus $data/aidatatang_200zh/transcript || exit 1;
+
+# Lexicon Preparation: build a large lexicon that invovles words in both the training and decoding
+local/prepare_dict.sh || exit 1;
+
+# Prepare Language Stuff
+# Phone Sets, questions, L compilation
+utils/prepare_lang.sh --position-dependent-phones false data/local/dict "" data/local/lang data/lang || exit 1;
+
+# LM training
+local/train_lms.sh || exit 1;
+
+# G compilation, check LG composition
+local/format_data.sh
+
+# Now make MFCC plus pitch features.
+# mfccdir should be some place with a largish disk where you want to store MFCC features.
+mfccdir=mfcc
+for x in train dev test; do
+ steps/make_mfcc_pitch.sh --write_utt2dur false --write_utt2num_frames false --cmd "$train_cmd" --nj 10 data/$x exp/make_mfcc/$x $mfccdir || exit 1;
+ steps/compute_cmvn_stats.sh data/$x exp/make_mfcc/$x $mfccdir || exit 1;
+ utils/fix_data_dir.sh data/$x || exit 1;
+done
+
+steps/train_mono.sh --cmd "$train_cmd" --nj 10 \
+ data/train data/lang exp/mono || exit 1;
+
+# Monophone decoding
+utils/mkgraph.sh data/lang_test exp/mono exp/mono/graph || exit 1;
+steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj 10 \
+ exp/mono/graph data/dev exp/mono/decode_dev
+
+steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj 10 \
+ exp/mono/graph data/test exp/mono/decode_test
+
+# Get alignments from monophone system.
+steps/align_si.sh --cmd "$train_cmd" --nj 10 \
+ data/train data/lang exp/mono exp/mono_ali || exit 1;
+
+# train tri1 [first triphone pass]
+steps/train_deltas.sh --cmd "$train_cmd" \
+ 2500 20000 data/train data/lang exp/mono_ali exp/tri1 || exit 1;
+
+# decode tri1
+utils/mkgraph.sh data/lang_test exp/tri1 exp/tri1/graph || exit 1;
+steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj 10 \
+ exp/tri1/graph data/dev exp/tri1/decode_dev
+steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj 10 \
+ exp/tri1/graph data/test exp/tri1/decode_test
+
+# align tri1
+steps/align_si.sh --cmd "$train_cmd" --nj 10 \
+ data/train data/lang exp/tri1 exp/tri1_ali || exit 1;
+
+# train tri2 [delta+delta-deltas]
+steps/train_deltas.sh --cmd "$train_cmd" \
+ 2500 20000 data/train data/lang exp/tri1_ali exp/tri2 || exit 1;
+
+# decode tri2
+utils/mkgraph.sh data/lang_test exp/tri2 exp/tri2/graph
+steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj 10 \
+ exp/tri2/graph data/dev exp/tri2/decode_dev
+steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj 10 \
+ exp/tri2/graph data/test exp/tri2/decode_test
+
+#align tri2
+steps/align_si.sh --cmd "$train_cmd" --nj 10 \
+ data/train data/lang exp/tri2 exp/tri2_ali || exit 1;
+
+# Train tri3a, which is LDA+MLLT,
+steps/train_lda_mllt.sh --cmd "$train_cmd" \
+ 2500 20000 data/train data/lang exp/tri2_ali exp/tri3a || exit 1;
+
+utils/mkgraph.sh data/lang_test exp/tri3a exp/tri3a/graph || exit 1;
+steps/decode.sh --cmd "$decode_cmd" --nj 10 --config conf/decode.config \
+ exp/tri3a/graph data/dev exp/tri3a/decode_dev
+steps/decode.sh --cmd "$decode_cmd" --nj 10 --config conf/decode.config \
+ exp/tri3a/graph data/test exp/tri3a/decode_test
+
+# From now, we start building a more serious system (with SAT), and we'll
+# do the alignment with fMLLR.
+steps/align_fmllr.sh --cmd "$train_cmd" --nj 10 \
+ data/train data/lang exp/tri3a exp/tri3a_ali || exit 1;
+
+steps/train_sat.sh --cmd "$train_cmd" \
+ 2500 20000 data/train data/lang exp/tri3a_ali exp/tri4a || exit 1;
+
+utils/mkgraph.sh data/lang_test exp/tri4a exp/tri4a/graph
+steps/decode_fmllr.sh --cmd "$decode_cmd" --nj 10 --config conf/decode.config \
+ exp/tri4a/graph data/dev exp/tri4a/decode_dev
+steps/decode_fmllr.sh --cmd "$decode_cmd" --nj 10 --config conf/decode.config \
+ exp/tri4a/graph data/test exp/tri4a/decode_test
+
+steps/align_fmllr.sh --cmd "$train_cmd" --nj 10 \
+ data/train data/lang exp/tri4a exp/tri4a_ali
+
+# Building a larger SAT system.
+
+steps/train_sat.sh --cmd "$train_cmd" \
+ 3500 100000 data/train data/lang exp/tri4a_ali exp/tri5a || exit 1;
+
+utils/mkgraph.sh data/lang_test exp/tri5a exp/tri5a/graph || exit 1;
+steps/decode_fmllr.sh --cmd "$decode_cmd" --nj 10 --config conf/decode.config \
+ exp/tri5a/graph data/dev exp/tri5a/decode_dev || exit 1;
+steps/decode_fmllr.sh --cmd "$decode_cmd" --nj 10 --config conf/decode.config \
+ exp/tri5a/graph data/test exp/tri5a/decode_test || exit 1;
+
+steps/align_fmllr.sh --cmd "$train_cmd" --nj 10 \
+ data/train data/lang exp/tri5a exp/tri5a_ali || exit 1;
+
+# nnet3
+local/nnet3/run_tdnn.sh
+
+# chain
+local/chain/run_tdnn.sh
+
+# getting results (see RESULTS file)
+for x in exp/*/decode_test; do [ -d $x ] && grep WER $x/cer_* | utils/best_wer.sh; done 2>/dev/null
+
+exit 0;
diff --git a/egs/aidatatang_200zh/s5/steps b/egs/aidatatang_200zh/s5/steps
new file mode 120000
index 00000000000..6e99bf5b5ad
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/steps
@@ -0,0 +1 @@
+../../wsj/s5/steps
\ No newline at end of file
diff --git a/egs/aidatatang_200zh/s5/utils b/egs/aidatatang_200zh/s5/utils
new file mode 120000
index 00000000000..b240885218f
--- /dev/null
+++ b/egs/aidatatang_200zh/s5/utils
@@ -0,0 +1 @@
+../../wsj/s5/utils
\ No newline at end of file
diff --git a/egs/aishell/s5/local/chain/tuning/run_tdnn_1a.sh b/egs/aishell/s5/local/chain/tuning/run_tdnn_1a.sh
index a0b183e3c5a..b38fa4d9c7a 100755
--- a/egs/aishell/s5/local/chain/tuning/run_tdnn_1a.sh
+++ b/egs/aishell/s5/local/chain/tuning/run_tdnn_1a.sh
@@ -90,7 +90,7 @@ if [ $stage -le 10 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $treedir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/aishell/s5/local/chain/tuning/run_tdnn_2a.sh b/egs/aishell/s5/local/chain/tuning/run_tdnn_2a.sh
index 2ebe2a3092b..6b7223785d9 100755
--- a/egs/aishell/s5/local/chain/tuning/run_tdnn_2a.sh
+++ b/egs/aishell/s5/local/chain/tuning/run_tdnn_2a.sh
@@ -92,7 +92,7 @@ if [ $stage -le 10 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $treedir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/aishell/s5/local/download_and_untar.sh b/egs/aishell/s5/local/download_and_untar.sh
index 3578a1c0835..58a278241d7 100755
--- a/egs/aishell/s5/local/download_and_untar.sh
+++ b/egs/aishell/s5/local/download_and_untar.sh
@@ -57,7 +57,7 @@ if [ -f $data/$part.tgz ]; then
if ! $size_ok; then
echo "$0: removing existing file $data/$part.tgz because its size in bytes $size"
echo "does not equal the size of one of the archives."
- rm $data/$part.gz
+ rm $data/$part.tgz
else
echo "$data/$part.tgz exists and appears to be complete."
fi
diff --git a/egs/aishell/v1/local/download_and_untar.sh b/egs/aishell/v1/local/download_and_untar.sh
index 0189bad1d4a..3578a1c0835 100755
--- a/egs/aishell/v1/local/download_and_untar.sh
+++ b/egs/aishell/v1/local/download_and_untar.sh
@@ -15,7 +15,7 @@ if [ $# -ne 3 ]; then
echo "Usage: $0 [--remove-archive] "
echo "e.g.: $0 /export/a05/xna/data www.openslr.org/resources/33 data_aishell"
echo "With --remove-archive it will remove the archive after successfully un-tarring it."
- echo " can be one of: data_aishell, resource."
+ echo " can be one of: data_aishell, resource_aishell."
fi
data=$1
@@ -28,7 +28,7 @@ if [ ! -d "$data" ]; then
fi
part_ok=false
-list="data_aishell resource"
+list="data_aishell resource_aishell"
for x in $list; do
if [ "$part" == $x ]; then part_ok=true; fi
done
diff --git a/egs/aishell2/s5/local/chain/tuning/run_tdnn_1a.sh b/egs/aishell2/s5/local/chain/tuning/run_tdnn_1a.sh
index 459bd64eeb5..86c9becac5b 100755
--- a/egs/aishell2/s5/local/chain/tuning/run_tdnn_1a.sh
+++ b/egs/aishell2/s5/local/chain/tuning/run_tdnn_1a.sh
@@ -103,7 +103,7 @@ fi
if [ $stage -le 10 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $treedir/tree | grep num-pdfs | awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
opts="l2-regularize=0.002"
linear_opts="orthonormal-constraint=1.0"
output_opts="l2-regularize=0.0005 bottleneck-dim=256"
diff --git a/egs/aishell2/s5/local/chain/tuning/run_tdnn_1b.sh b/egs/aishell2/s5/local/chain/tuning/run_tdnn_1b.sh
index ba2a4344349..d8560e63909 100755
--- a/egs/aishell2/s5/local/chain/tuning/run_tdnn_1b.sh
+++ b/egs/aishell2/s5/local/chain/tuning/run_tdnn_1b.sh
@@ -150,7 +150,7 @@ if [ $stage -le 10 ]; then
echo "$0: creating neural net configs using the xconfig parser";
feat_dim=$(feat-to-dim scp:data/${train_set}_hires/feats.scp -)
num_targets=$(tree-info $treedir/tree | grep num-pdfs | awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
opts="l2-regularize=0.002"
linear_opts="orthonormal-constraint=1.0"
output_opts="l2-regularize=0.0005 bottleneck-dim=256"
diff --git a/egs/ami/s5/local/ami_ihm_scoring_data_prep.sh b/egs/ami/s5/local/ami_ihm_scoring_data_prep.sh
index 3157d7ffec7..7112e0259a0 100755
--- a/egs/ami/s5/local/ami_ihm_scoring_data_prep.sh
+++ b/egs/ami/s5/local/ami_ihm_scoring_data_prep.sh
@@ -87,18 +87,15 @@ sort -k 2 $dir/utt2spk | utils/utt2spk_to_spk2utt.pl > $dir/spk2utt || exit 1;
join $dir/utt2spk $dir/segments | \
perl -ne '{BEGIN{$pu=""; $pt=0.0;} split;
if ($pu eq $_[1] && $pt > $_[3]) {
- print "$_[0] $_[2] $_[3] $_[4]>$_[0] $_[2] $pt $_[4]\n"
+ print "s/^$_[0] $_[2] $_[3] $_[4]\$/$_[0] $_[2] $pt $_[4]/;\n"
}
- $pu=$_[1]; $pt=$_[4];
+ $pu=$_[1]; $pt=$_[4];
}' > $dir/segments_to_fix
-if [ `cat $dir/segments_to_fix | wc -l` -gt 0 ]; then
+
+if [ -s $dir/segments_to_fix ]; then
echo "$0. Applying following fixes to segments"
cat $dir/segments_to_fix
- while read line; do
- p1=`echo $line | awk -F'>' '{print $1}'`
- p2=`echo $line | awk -F'>' '{print $2}'`
- sed -ir "s!$p1!$p2!" $dir/segments
- done < $dir/segments_to_fix
+ perl -i -pf $dir/segments_to_fix $dir/segments
fi
# Copy stuff into its final locations
diff --git a/egs/ami/s5/local/ami_mdm_scoring_data_prep.sh b/egs/ami/s5/local/ami_mdm_scoring_data_prep.sh
index 4cfa9110edf..9c4b55308f2 100755
--- a/egs/ami/s5/local/ami_mdm_scoring_data_prep.sh
+++ b/egs/ami/s5/local/ami_mdm_scoring_data_prep.sh
@@ -94,19 +94,15 @@ awk '{print $1}' $tmpdir/segments | \
join $tmpdir/utt2spk_stm $tmpdir/segments | \
awk '{ utt=$1; spk=$2; wav=$3; t_beg=$4; t_end=$5;
if(spk_prev == spk && t_end_prev > t_beg) {
- print utt, wav, t_beg, t_end">"utt, wav, t_end_prev, t_end;
+ print "s/^"utt, wav, t_beg, t_end"$/"utt, wav, t_end_prev, t_end"/;";
}
spk_prev=spk; t_end_prev=t_end;
}' > $tmpdir/segments_to_fix
-if [ `cat $tmpdir/segments_to_fix | wc -l` -gt 0 ]; then
+if [ -s $tmpdir/segments_to_fix ]; then
echo "$0. Applying following fixes to segments"
cat $tmpdir/segments_to_fix
- while read line; do
- p1=`echo $line | awk -F'>' '{print $1}'`
- p2=`echo $line | awk -F'>' '{print $2}'`
- sed -ir "s:$p1:$p2:" $tmpdir/segments
- done < $tmpdir/segments_to_fix
+ perl -i -pf $tmpdir/segments_to_fix $tmpdir/segments
fi
# Copy stuff into its final locations [this has been moved from the format_data
diff --git a/egs/ami/s5/local/ami_sdm_scoring_data_prep.sh b/egs/ami/s5/local/ami_sdm_scoring_data_prep.sh
index 91baa37d6e1..815e1b2d270 100755
--- a/egs/ami/s5/local/ami_sdm_scoring_data_prep.sh
+++ b/egs/ami/s5/local/ami_sdm_scoring_data_prep.sh
@@ -101,19 +101,15 @@ awk '{print $1}' $tmpdir/segments | \
join $tmpdir/utt2spk_stm $tmpdir/segments | \
awk '{ utt=$1; spk=$2; wav=$3; t_beg=$4; t_end=$5;
if(spk_prev == spk && t_end_prev > t_beg) {
- print utt, wav, t_beg, t_end">"utt, wav, t_end_prev, t_end;
+ print "s/^"utt, wav, t_beg, t_end"$/"utt, wav, t_end_prev, t_end"/;";
}
spk_prev=spk; t_end_prev=t_end;
}' > $tmpdir/segments_to_fix
-if [ `cat $tmpdir/segments_to_fix | wc -l` -gt 0 ]; then
+if [ -s $tmpdir/segments_to_fix ]; then
echo "$0. Applying following fixes to segments"
cat $tmpdir/segments_to_fix
- while read line; do
- p1=`echo $line | awk -F'>' '{print $1}'`
- p2=`echo $line | awk -F'>' '{print $2}'`
- sed -ir "s:$p1:$p2:" $tmpdir/segments
- done < $tmpdir/segments_to_fix
+ perl -i -pf $tmpdir/segments_to_fix $tmpdir/segments
fi
# Copy stuff into its final locations [this has been moved from the format_data
diff --git a/egs/ami/s5b/local/ami_ihm_scoring_data_prep.sh b/egs/ami/s5b/local/ami_ihm_scoring_data_prep.sh
index 746c42c4c1a..c54876331f1 100755
--- a/egs/ami/s5b/local/ami_ihm_scoring_data_prep.sh
+++ b/egs/ami/s5b/local/ami_ihm_scoring_data_prep.sh
@@ -93,18 +93,15 @@ sort -k 2 $dir/utt2spk | utils/utt2spk_to_spk2utt.pl > $dir/spk2utt || exit 1;
join $dir/utt2spk $dir/segments | \
perl -ne '{BEGIN{$pu=""; $pt=0.0;} split;
if ($pu eq $_[1] && $pt > $_[3]) {
- print "$_[0] $_[2] $_[3] $_[4]>$_[0] $_[2] $pt $_[4]\n"
+ print "s/^$_[0] $_[2] $_[3] $_[4]\$/$_[0] $_[2] $pt $_[4]/;\n"
}
$pu=$_[1]; $pt=$_[4];
}' > $dir/segments_to_fix
-if [ `cat $dir/segments_to_fix | wc -l` -gt 0 ]; then
+
+if [ -s $dir/segments_to_fix ]; then
echo "$0. Applying following fixes to segments"
cat $dir/segments_to_fix
- while read line; do
- p1=`echo $line | awk -F'>' '{print $1}'`
- p2=`echo $line | awk -F'>' '{print $2}'`
- sed -ir "s!$p1!$p2!" $dir/segments
- done < $dir/segments_to_fix
+ perl -i -pf $dir/segments_to_fix $dir/segments
fi
# Copy stuff into its final locations
diff --git a/egs/ami/s5b/local/ami_mdm_scoring_data_prep.sh b/egs/ami/s5b/local/ami_mdm_scoring_data_prep.sh
index 65f514f223c..475ef5405ba 100755
--- a/egs/ami/s5b/local/ami_mdm_scoring_data_prep.sh
+++ b/egs/ami/s5b/local/ami_mdm_scoring_data_prep.sh
@@ -99,19 +99,15 @@ awk '{print $1}' $tmpdir/segments | \
join $tmpdir/utt2spk_stm $tmpdir/segments | \
awk '{ utt=$1; spk=$2; wav=$3; t_beg=$4; t_end=$5;
if(spk_prev == spk && t_end_prev > t_beg) {
- print utt, wav, t_beg, t_end">"utt, wav, t_end_prev, t_end;
+ print "s/^"utt, wav, t_beg, t_end"$/"utt, wav, t_end_prev, t_end"/;";
}
spk_prev=spk; t_end_prev=t_end;
}' > $tmpdir/segments_to_fix
-if [ `cat $tmpdir/segments_to_fix | wc -l` -gt 0 ]; then
+if [ -s $tmpdir/segments_to_fix ]; then
echo "$0. Applying following fixes to segments"
cat $tmpdir/segments_to_fix
- while read line; do
- p1=`echo $line | awk -F'>' '{print $1}'`
- p2=`echo $line | awk -F'>' '{print $2}'`
- sed -ir "s:$p1:$p2:" $tmpdir/segments
- done < $tmpdir/segments_to_fix
+ perl -i -pf $tmpdir/segments_to_fix $tmpdir/segments
fi
# Copy stuff into its final locations [this has been moved from the format_data
diff --git a/egs/ami/s5b/local/ami_sdm_scoring_data_prep.sh b/egs/ami/s5b/local/ami_sdm_scoring_data_prep.sh
index 1378f8b8965..d7ce038c0a7 100755
--- a/egs/ami/s5b/local/ami_sdm_scoring_data_prep.sh
+++ b/egs/ami/s5b/local/ami_sdm_scoring_data_prep.sh
@@ -111,25 +111,21 @@ awk '{print $1}' $tmpdir/segments | \
join $tmpdir/utt2spk_stm $tmpdir/segments | \
awk '{ utt=$1; spk=$2; wav=$3; t_beg=$4; t_end=$5;
if(spk_prev == spk && t_end_prev > t_beg) {
- print utt, wav, t_beg, t_end">"utt, wav, t_end_prev, t_end;
+ print "s/^"utt, wav, t_beg, t_end"$/"utt, wav, t_end_prev, t_end"/;";
}
spk_prev=spk; t_end_prev=t_end;
}' > $tmpdir/segments_to_fix
-if [ `cat $tmpdir/segments_to_fix | wc -l` -gt 0 ]; then
+if [ -s $tmpdir/segments_to_fix ]; then
echo "$0. Applying following fixes to segments"
cat $tmpdir/segments_to_fix
- while read line; do
- p1=`echo $line | awk -F'>' '{print $1}'`
- p2=`echo $line | awk -F'>' '{print $2}'`
- sed -ir "s:$p1:$p2:" $tmpdir/segments
- done < $tmpdir/segments_to_fix
+ perl -i -pf $tmpdir/segments_to_fix $tmpdir/segments
fi
# Copy stuff into its final locations [this has been moved from the format_data
# script]
mkdir -p $dir
-for f in spk2utt utt2spk utt2spk_stm wav.scp text segments reco2file_and_channel; do
+for f in segments_to_fix spk2utt utt2spk utt2spk_stm wav.scp text segments reco2file_and_channel; do
cp $tmpdir/$f $dir/$f || exit 1;
done
diff --git a/egs/ami/s5b/local/chain/multi_condition/tuning/run_tdnn_1a.sh b/egs/ami/s5b/local/chain/multi_condition/tuning/run_tdnn_1a.sh
index 1fc641f1166..4d260e3c517 100755
--- a/egs/ami/s5b/local/chain/multi_condition/tuning/run_tdnn_1a.sh
+++ b/egs/ami/s5b/local/chain/multi_condition/tuning/run_tdnn_1a.sh
@@ -220,7 +220,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
affine_opts="l2-regularize=0.01 dropout-proportion=0.0 dropout-per-dim=true dropout-per-dim-continuous=true"
tdnnf_opts="l2-regularize=0.01 dropout-proportion=0.0 bypass-scale=0.66"
linear_opts="l2-regularize=0.01 orthonormal-constraint=-1.0"
diff --git a/egs/ami/s5b/local/chain/multi_condition/tuning/run_tdnn_lstm_1a.sh b/egs/ami/s5b/local/chain/multi_condition/tuning/run_tdnn_lstm_1a.sh
index a8494420b0d..3546b6a7ced 100755
--- a/egs/ami/s5b/local/chain/multi_condition/tuning/run_tdnn_lstm_1a.sh
+++ b/egs/ami/s5b/local/chain/multi_condition/tuning/run_tdnn_lstm_1a.sh
@@ -211,7 +211,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/multi_condition/tuning/run_tdnn_lstm_1b.sh b/egs/ami/s5b/local/chain/multi_condition/tuning/run_tdnn_lstm_1b.sh
index a12e7efa7b9..1a839b045bd 100755
--- a/egs/ami/s5b/local/chain/multi_condition/tuning/run_tdnn_lstm_1b.sh
+++ b/egs/ami/s5b/local/chain/multi_condition/tuning/run_tdnn_lstm_1b.sh
@@ -235,7 +235,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
tdnn_opts="l2-regularize=0.006"
lstm_opts="l2-regularize=0.0025 decay-time=20 dropout-proportion=0.0"
output_opts="l2-regularize=0.001"
diff --git a/egs/ami/s5b/local/chain/tuning/run_cnn_tdnn_lstm_1a.sh b/egs/ami/s5b/local/chain/tuning/run_cnn_tdnn_lstm_1a.sh
index 16d1f4044f5..d926c1dc6d7 100644
--- a/egs/ami/s5b/local/chain/tuning/run_cnn_tdnn_lstm_1a.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_cnn_tdnn_lstm_1a.sh
@@ -184,7 +184,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20"
diff --git a/egs/ami/s5b/local/chain/tuning/run_cnn_tdnn_lstm_1b.sh b/egs/ami/s5b/local/chain/tuning/run_cnn_tdnn_lstm_1b.sh
index 83e6a95582f..d9cd1c356e8 100644
--- a/egs/ami/s5b/local/chain/tuning/run_cnn_tdnn_lstm_1b.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_cnn_tdnn_lstm_1b.sh
@@ -176,7 +176,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20 dropout-proportion=0"
diff --git a/egs/ami/s5b/local/chain/tuning/run_cnn_tdnn_lstm_1c.sh b/egs/ami/s5b/local/chain/tuning/run_cnn_tdnn_lstm_1c.sh
index 387b4bfcc88..a0805b4f9f1 100755
--- a/egs/ami/s5b/local/chain/tuning/run_cnn_tdnn_lstm_1c.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_cnn_tdnn_lstm_1c.sh
@@ -185,7 +185,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=40"
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_1b.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_1b.sh
index 57108dbddae..997357b80a9 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_1b.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_1b.sh
@@ -164,7 +164,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_1c.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_1c.sh
index f87e1a12d36..4d062e65429 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_1c.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_1c.sh
@@ -151,7 +151,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_1d.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_1d.sh
index eb84a1cd876..387570388d0 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_1d.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_1d.sh
@@ -163,7 +163,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_1e.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_1e.sh
index e6592b667dc..0436b08cdc0 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_1e.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_1e.sh
@@ -161,7 +161,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_1f.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_1f.sh
index 8bf2b73dada..4ca526d63b8 100644
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_1f.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_1f.sh
@@ -165,7 +165,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_1g.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_1g.sh
index dfb6dfedee7..baed760bb68 100644
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_1g.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_1g.sh
@@ -166,7 +166,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_1h.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_1h.sh
index 3e26a8b38bd..e721a858c0a 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_1h.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_1h.sh
@@ -167,7 +167,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_1i.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_1i.sh
index 1931127c86d..de40cb2d1a4 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_1i.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_1i.sh
@@ -168,7 +168,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
opts="l2-regularize=0.02"
output_opts="l2-regularize=0.004"
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1a.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1a.sh
index d63712f1f0f..4f580b88f6b 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1a.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1a.sh
@@ -171,7 +171,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1b.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1b.sh
index a53785f45c2..904a079d7de 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1b.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1b.sh
@@ -173,7 +173,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1c.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1c.sh
index 76a9f735c5f..511e520465a 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1c.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1c.sh
@@ -172,7 +172,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1d.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1d.sh
index 8cc1a4e15fa..bd81b7df4eb 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1d.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1d.sh
@@ -172,7 +172,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1e.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1e.sh
index accfd158a9d..50903e78b6d 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1e.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1e.sh
@@ -174,7 +174,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1f.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1f.sh
index 2b275e4e27d..f6c53001498 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1f.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1f.sh
@@ -173,7 +173,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1g.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1g.sh
index 1c90af38c4c..79fd9ef3fb5 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1g.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1g.sh
@@ -174,7 +174,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1h.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1h.sh
index fb4b6a475e2..e58a7f89e03 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1h.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1h.sh
@@ -171,7 +171,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1i.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1i.sh
index 92636b4c17e..13f894f5a48 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1i.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1i.sh
@@ -174,7 +174,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1j.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1j.sh
index 89fd8ce2915..48b31832e8c 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1j.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1j.sh
@@ -181,7 +181,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20"
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1k.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1k.sh
index b8d947d8e92..e675bc494bb 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1k.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1k.sh
@@ -177,7 +177,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20"
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1l.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1l.sh
index 74c0f5a6ead..2d019398274 100644
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1l.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1l.sh
@@ -224,7 +224,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1m.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1m.sh
index b0e7af0618d..9e5b971bbe2 100644
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1m.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1m.sh
@@ -226,7 +226,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20 dropout-proportion=0.0"
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1n.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1n.sh
index bee4d997b01..9575c3cf686 100644
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1n.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1n.sh
@@ -178,7 +178,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1o.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1o.sh
index 1e4111adc6a..a7f2625c181 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1o.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_1o.sh
@@ -182,7 +182,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
tdnn_opts="l2-regularize=0.025"
lstm_opts="l2-regularize=0.01"
output_opts="l2-regularize=0.004"
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_bs_1a.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_bs_1a.sh
index b672a44e572..ca920869b30 100755
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_bs_1a.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_lstm_bs_1a.sh
@@ -180,7 +180,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
tdnn_opts="l2-regularize=0.003"
lstm_opts="l2-regularize=0.005"
output_opts="l2-regularize=0.001"
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_opgru_1a.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_opgru_1a.sh
index f68c4203767..53dbd5238db 100644
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_opgru_1a.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_opgru_1a.sh
@@ -178,7 +178,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
gru_opts="dropout-per-frame=true dropout-proportion=0.0"
mkdir -p $dir/configs
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_opgru_1b.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_opgru_1b.sh
index ac4266ca162..dafef668e60 100644
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_opgru_1b.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_opgru_1b.sh
@@ -177,7 +177,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
gru_opts="dropout-per-frame=true dropout-proportion=0.0"
mkdir -p $dir/configs
diff --git a/egs/ami/s5b/local/chain/tuning/run_tdnn_opgru_1c.sh b/egs/ami/s5b/local/chain/tuning/run_tdnn_opgru_1c.sh
index 74b21f10c33..677946d0b9a 100644
--- a/egs/ami/s5b/local/chain/tuning/run_tdnn_opgru_1c.sh
+++ b/egs/ami/s5b/local/chain/tuning/run_tdnn_opgru_1c.sh
@@ -176,7 +176,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
gru_opts="dropout-per-frame=true dropout-proportion=0.0"
mkdir -p $dir/configs
diff --git a/egs/aspire/s5/local/chain/tuning/run_blstm_7b.sh b/egs/aspire/s5/local/chain/tuning/run_blstm_7b.sh
index 8ff59d83ed0..bd13010c791 100755
--- a/egs/aspire/s5/local/chain/tuning/run_blstm_7b.sh
+++ b/egs/aspire/s5/local/chain/tuning/run_blstm_7b.sh
@@ -138,7 +138,7 @@ if [ $stage -le 11 ]; then
num_targets=$(tree-info $treedir/tree | grep num-pdfs | awk '{print $2}')
[ -z $num_targets ] && { echo "$0: error getting num-targets"; exit 1; }
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20"
diff --git a/egs/aspire/s5/local/chain/tuning/run_blstm_asp_1.sh b/egs/aspire/s5/local/chain/tuning/run_blstm_asp_1.sh
index 0ca6062e9c8..b5979a3ce6b 100755
--- a/egs/aspire/s5/local/chain/tuning/run_blstm_asp_1.sh
+++ b/egs/aspire/s5/local/chain/tuning/run_blstm_asp_1.sh
@@ -208,7 +208,7 @@ if [ $stage -le 14 ]; then
extra_right_context=$[$chunk_right_context+10]
# %WER 26.8 | 2120 27220 | 80.2 11.7 8.1 7.0 26.8 76.5 | -0.804 | exp/chain/blstm_asp_1/decode_dev_aspire_whole_uniformsegmented_win10_over5_v7_iterfinal_pp_fg/score_9/penalty_0.0/
- local/nnet3/prep_test_aspire.sh --stage 4 --decode-num-jobs 30 --affix "v7" \
+ local/multi_condition/prep_test_aspire.sh --stage 4 --decode-num-jobs 30 --affix "v7" \
--extra-left-context $extra_left_context \
--extra-right-context $extra_right_context \
--frames-per-chunk $chunk_width \
diff --git a/egs/aspire/s5/local/chain/tuning/run_tdnn_7b.sh b/egs/aspire/s5/local/chain/tuning/run_tdnn_7b.sh
index 201f61dc64b..cd548142598 100755
--- a/egs/aspire/s5/local/chain/tuning/run_tdnn_7b.sh
+++ b/egs/aspire/s5/local/chain/tuning/run_tdnn_7b.sh
@@ -136,7 +136,7 @@ if [ $stage -le 11 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $treedir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
@@ -182,6 +182,7 @@ if [ $stage -le 12 ]; then
/export/b0{5,6,7,8}/$USER/kaldi-data/egs/aspire-$(date +'%m_%d_%H_%M')/s5c/$dir/egs/storage $dir/egs/storage
fi
+ mkdir -p $dir/egs
touch $dir/egs/.nodelete # keep egs around when that run dies.
steps/nnet3/chain/train.py --stage $train_stage \
diff --git a/egs/aspire/s5/local/chain/tuning/run_tdnn_lstm_1a.sh b/egs/aspire/s5/local/chain/tuning/run_tdnn_lstm_1a.sh
index 63d3a7ca988..f98dff5e6fa 100755
--- a/egs/aspire/s5/local/chain/tuning/run_tdnn_lstm_1a.sh
+++ b/egs/aspire/s5/local/chain/tuning/run_tdnn_lstm_1a.sh
@@ -26,7 +26,6 @@ cell_dim=1024
projection_dim=256
# training options
-num_epochs=2
minibatch_size=64,32
chunk_left_context=40
chunk_right_context=0
@@ -95,7 +94,7 @@ if [ $stage -le 8 ]; then
for n in `seq $nj`; do
awk '{print $1}' data/${train_set}/split$nj/$n/utt2spk | \
- perl -ane 's/rev[1-3]_//g' > $lat_dir/uttlist.$n.$nj
+ perl -ane 's/rev[1-3]-//g' > $lat_dir/uttlist.$n.$nj
done
rm -f $lat_dir/lat_tmp.*.{ark,scp} 2>/dev/null
@@ -106,7 +105,7 @@ if [ $stage -le 8 ]; then
ark,scp:$lat_dir/lat_tmp.JOB.ark,$lat_dir/lat_tmp.JOB.scp || exit 1
for n in `seq 3`; do
- cat $lat_dir/lat_tmp.*.scp | awk -v n=$n '{print "rev"n"_"$1" "$2}'
+ cat $lat_dir/lat_tmp.*.scp | awk -v n=$n '{print "rev"n"-"$1" "$2}'
done > $lat_dir/lat_rvb.scp
$train_cmd JOB=1:$nj $lat_dir/log/copy_rvb_lattices.JOB.log \
@@ -151,7 +150,7 @@ if [ $stage -le 12 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $treedir/tree |grep num-pdfs|awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=40"
@@ -309,4 +308,3 @@ if [ $stage -le 17 ]; then
fi
exit 0;
-
diff --git a/egs/aspire/s5/local/multi_condition/prepare_impulses_noises.sh b/egs/aspire/s5/local/multi_condition/prepare_impulses_noises.sh
index 804de611cae..8297cdee9ca 100755
--- a/egs/aspire/s5/local/multi_condition/prepare_impulses_noises.sh
+++ b/egs/aspire/s5/local/multi_condition/prepare_impulses_noises.sh
@@ -114,7 +114,7 @@ cp ${output_dir}_non_normalized/info/* $output_dir/info
# rename file location in the noise-rir pairing files
for file in `ls $output_dir/info/noise_impulse*`; do
- sed -i "s/_non_normalized//g" $file
+ perl -i -pe "s/_non_normalized//g" $file
done
# generating the rir-list with probabilities alloted for each rir
diff --git a/egs/babel/s5c/local/syllab/generate_syllable_lang.sh b/egs/babel/s5c/local/syllab/generate_syllable_lang.sh
index 2d1fcb2259e..4a0810b9415 100755
--- a/egs/babel/s5c/local/syllab/generate_syllable_lang.sh
+++ b/egs/babel/s5c/local/syllab/generate_syllable_lang.sh
@@ -118,8 +118,7 @@ ln -s lex.syllabs2phones.disambig.fst $out/L_disambig.fst
echo "Validating the output lang dir"
utils/validate_lang.pl $out || exit 1
-sed -i'' 's/#1$//g' $lout/lexicon.txt
-sed -i'' 's/#1$//g' $lout/lexiconp.txt
+perl -i -pe 's/#1$//g' $lout/lexicon.txt $lout/lexiconp.txt
echo "Done OK."
exit 0
diff --git a/egs/babel/s5d/conf/lang/404-georgian.FLP.official.conf b/egs/babel/s5d/conf/lang/404-georgian.FLP.official.conf
index a6b22de419f..9cd043716ce 100644
--- a/egs/babel/s5d/conf/lang/404-georgian.FLP.official.conf
+++ b/egs/babel/s5d/conf/lang/404-georgian.FLP.official.conf
@@ -75,8 +75,8 @@ unsup_data_list=./conf/lists/404-georgian/untranscribed-training.list
unsup_nj=32
-lexicon_file=
-lexiconFlags="--romanized --oov "
+lexicon_file=/export/corpora/LDC/LDC2016S12/IARPA_BABEL_OP3_404/conversational/reference_materials/lexicon.txt
+lexiconFlags=" --romanized --oov "
diff --git a/egs/babel/s5d/local/chain/tuning/run_tdnn.sh b/egs/babel/s5d/local/chain/tuning/run_tdnn.sh
index 4f485edf7da..7b4535f8c5e 100755
--- a/egs/babel/s5d/local/chain/tuning/run_tdnn.sh
+++ b/egs/babel/s5d/local/chain/tuning/run_tdnn.sh
@@ -128,7 +128,7 @@ if [ $stage -le 17 ]; then
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
[ -z $num_targets ] && { echo "$0: error getting num-targets"; exit 1; }
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
mkdir -p $dir/configs
cat < $dir/configs/network.xconfig
diff --git a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm.sh b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm.sh
index 72f7a3c32dd..5fc14dda826 100755
--- a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm.sh
+++ b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm.sh
@@ -129,7 +129,7 @@ if [ $stage -le 17 ]; then
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
[ -z $num_targets ] && { echo "$0: error getting num-targets"; exit 1; }
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20"
label_delay=5
diff --git a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab1.sh b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab1.sh
index be0c2cc4b9b..8c7de5d18d4 100755
--- a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab1.sh
+++ b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab1.sh
@@ -127,7 +127,7 @@ if [ $stage -le 17 ]; then
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
[ -z $num_targets ] && { echo "$0: error getting num-targets"; exit 1; }
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20"
label_delay=5
diff --git a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab2.sh b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab2.sh
index 8f21a239794..0b3e70b5a04 100755
--- a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab2.sh
+++ b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab2.sh
@@ -127,7 +127,7 @@ if [ $stage -le 17 ]; then
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
[ -z $num_targets ] && { echo "$0: error getting num-targets"; exit 1; }
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20"
label_delay=5
diff --git a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab3.sh b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab3.sh
index 7898d172242..45f2907645e 100755
--- a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab3.sh
+++ b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab3.sh
@@ -128,7 +128,7 @@ if [ $stage -le 17 ]; then
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
[ -z $num_targets ] && { echo "$0: error getting num-targets"; exit 1; }
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20"
label_delay=5
diff --git a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab4.sh b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab4.sh
index 49462573245..0d92aff5c28 100755
--- a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab4.sh
+++ b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab4.sh
@@ -128,7 +128,7 @@ if [ $stage -le 17 ]; then
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
[ -z $num_targets ] && { echo "$0: error getting num-targets"; exit 1; }
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20"
label_delay=5
diff --git a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab5.sh b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab5.sh
index c888d985f5e..4129c00dcb4 100755
--- a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab5.sh
+++ b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab5.sh
@@ -128,7 +128,7 @@ if [ $stage -le 17 ]; then
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
[ -z $num_targets ] && { echo "$0: error getting num-targets"; exit 1; }
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20"
label_delay=5
diff --git a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab6.sh b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab6.sh
index e9a045e113a..1cfa50c1aa1 100755
--- a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab6.sh
+++ b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab6.sh
@@ -128,7 +128,7 @@ if [ $stage -le 17 ]; then
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
[ -z $num_targets ] && { echo "$0: error getting num-targets"; exit 1; }
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20"
label_delay=5
diff --git a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab7.sh b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab7.sh
index ce192a91665..ba8ac1e0373 100755
--- a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab7.sh
+++ b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab7.sh
@@ -129,7 +129,7 @@ if [ $stage -le 17 ]; then
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
[ -z $num_targets ] && { echo "$0: error getting num-targets"; exit 1; }
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20 dropout-proportion=0.0"
label_delay=5
diff --git a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab8.sh b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab8.sh
index 3fc0ef2206c..5de285e080e 100755
--- a/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab8.sh
+++ b/egs/babel/s5d/local/chain/tuning/run_tdnn_lstm_bab8.sh
@@ -129,7 +129,7 @@ if [ $stage -le 17 ]; then
num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
[ -z $num_targets ] && { echo "$0: error getting num-targets"; exit 1; }
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
lstm_opts="decay-time=20 dropout-proportion=0.0 "
label_delay=5
diff --git a/egs/babel/s5d/local/make_L_align.sh b/egs/babel/s5d/local/make_L_align.sh
index 50e46a00493..41e9ff32958 100755
--- a/egs/babel/s5d/local/make_L_align.sh
+++ b/egs/babel/s5d/local/make_L_align.sh
@@ -34,18 +34,24 @@ tmpdir=$1
dir=$2
outdir=$3
+for f in $dir/phones/optional_silence.txt $dir/phones.txt $dir/words.txt ; do
+ [ ! -f $f ] && echo "$0: The file $f must exist!" exit 1
+fi
+
silphone=`cat $dir/phones/optional_silence.txt` || exit 1;
+if [ ! -f $tmpdir/lexicon.txt ] && [ ! -f $tmpdir/lexiconp.txt ] ; then
+ echo "$0: At least one of the files $tmpdir/lexicon.txt or $tmpdir/lexiconp.txt must exist" >&2
+ exit 1
+fi
+
# Create lexicon with alignment info
if [ -f $tmpdir/lexicon.txt ] ; then
cat $tmpdir/lexicon.txt | \
awk '{printf("%s #1 ", $1); for (n=2; n <= NF; n++) { printf("%s ", $n); } print "#2"; }'
-elif [ -f $tmpdir/lexiconp.txt ] ; then
+else
cat $tmpdir/lexiconp.txt | \
awk '{printf("%s #1 ", $1); for (n=3; n <= NF; n++) { printf("%s ", $n); } print "#2"; }'
-else
- echo "Neither $tmpdir/lexicon.txt nor $tmpdir/lexiconp.txt does not exist"
- exit 1
fi | utils/make_lexicon_fst.pl - 0.5 $silphone | \
fstcompile --isymbols=$dir/phones.txt --osymbols=$dir/words.txt \
--keep_isymbols=false --keep_osymbols=false | \
diff --git a/egs/babel/s5d/local/syllab/generate_phone_lang.sh b/egs/babel/s5d/local/syllab/generate_phone_lang.sh
index fc21a23231b..81d8a0acdc7 100755
--- a/egs/babel/s5d/local/syllab/generate_phone_lang.sh
+++ b/egs/babel/s5d/local/syllab/generate_phone_lang.sh
@@ -122,8 +122,7 @@ ln -s lex.syllabs2phones.disambig.fst $out/L_disambig.fst
echo "Validating the output lang dir"
utils/validate_lang.pl $out || exit 1
-sed -i'' 's/#1$//g' $lout/lexicon.txt
-sed -i'' 's/#1$//g' $lout/lexiconp.txt
+perl -i -pe 's/#1$//g' $lout/lexicon.txt $lout/lexiconp.txt
echo "Done OK."
exit 0
diff --git a/egs/babel/s5d/local/syllab/generate_syllable_lang.sh b/egs/babel/s5d/local/syllab/generate_syllable_lang.sh
index db7b0902425..a7bd667027c 100755
--- a/egs/babel/s5d/local/syllab/generate_syllable_lang.sh
+++ b/egs/babel/s5d/local/syllab/generate_syllable_lang.sh
@@ -122,8 +122,7 @@ ln -s lex.syllabs2phones.disambig.fst $out/L_disambig.fst
echo "Validating the output lang dir"
utils/validate_lang.pl $out || exit 1
-sed -i'' 's/#1$//g' $lout/lexicon.txt
-sed -i'' 's/#1$//g' $lout/lexiconp.txt
+perl -i -pe 's/#1$//g' $lout/lexicon.txt $lout/lexiconp.txt
echo "Done OK."
exit 0
diff --git a/egs/bentham/v1/local/chain/tuning/run_cnn_e2eali_1a.sh b/egs/bentham/v1/local/chain/tuning/run_cnn_e2eali_1a.sh
index 6bac5a22398..ec530ef1ce4 100755
--- a/egs/bentham/v1/local/chain/tuning/run_cnn_e2eali_1a.sh
+++ b/egs/bentham/v1/local/chain/tuning/run_cnn_e2eali_1a.sh
@@ -139,7 +139,7 @@ if [ $stage -le 4 ]; then
echo "$0: creating neural net configs using the xconfig parser";
num_targets=$(tree-info $tree_dir/tree | grep num-pdfs | awk '{print $2}')
- learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
+ learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python)
cnn_opts="l2-regularize=0.03 dropout-proportion=0.0"
tdnn_opts="l2-regularize=0.03"
output_opts="l2-regularize=0.04"
diff --git a/egs/bentham/v1/local/create_splits.sh b/egs/bentham/v1/local/create_splits.sh
index 93e8bf1b12e..e8ea2279a49 100755
--- a/egs/bentham/v1/local/create_splits.sh
+++ b/egs/bentham/v1/local/create_splits.sh
@@ -27,10 +27,8 @@ function split {
echo $name $lines_dir"/"$name".png" >> $split_dir/images.scp
echo $name $spkid >> $split_dir/utt2spk
done < "$line_file"
-
- sed -i '/^\s*$/d' $split_dir/images.scp
- sed -i '/^\s*$/d' $split_dir/text
- sed -i '/^\s*$/d' $split_dir/utt2spk
+
+ perl -i -ne 'print if /\S/' $split_dir/images.scp $split_dir/text $split_dir/utt2spk
utils/utt2spk_to_spk2utt.pl $split_dir/utt2spk > $split_dir/spk2utt
}
diff --git a/egs/bn_music_speech/v1/local/make_musan.py b/egs/bn_music_speech/v1/local/make_musan.py
deleted file mode 100755
index 942973cfc65..00000000000
--- a/egs/bn_music_speech/v1/local/make_musan.py
+++ /dev/null
@@ -1,119 +0,0 @@
-#!/usr/bin/env python3
-# Copyright 2015 David Snyder
-# Apache 2.0.
-#
-# This file is meant to be invoked by make_musan.sh.
-
-import os, sys
-
-def process_music_annotations(path):
- utt2spk = {}
- utt2vocals = {}
- lines = open(path, 'r').readlines()
- for line in lines:
- utt, genres, vocals, musician = line.rstrip().split()[:4]
- # For this application, the musican ID isn't important
- utt2spk[utt] = utt
- utt2vocals[utt] = vocals == "Y"
- return utt2spk, utt2vocals
-
-def prepare_music(root_dir, use_vocals):
- utt2vocals = {}
- utt2spk = {}
- utt2wav = {}
- num_good_files = 0
- num_bad_files = 0
- music_dir = os.path.join(root_dir, "music")
- for root, dirs, files in os.walk(music_dir):
- for file in files:
- file_path = os.path.join(root, file)
- if file.endswith(".wav"):
- utt = str(file).replace(".wav", "")
- utt2wav[utt] = file_path
- elif str(file) == "ANNOTATIONS":
- utt2spk_part, utt2vocals_part = process_music_annotations(file_path)
- utt2spk.update(utt2spk_part)
- utt2vocals.update(utt2vocals_part)
- utt2spk_str = ""
- utt2wav_str = ""
- for utt in utt2vocals:
- if utt in utt2wav:
- if use_vocals or not utt2vocals[utt]:
- utt2spk_str = utt2spk_str + utt + " " + utt2spk[utt] + "\n"
- utt2wav_str = utt2wav_str + utt + " " + utt2wav[utt] + "\n"
- num_good_files += 1
- else:
- print("Missing file {}".format(utt))
- num_bad_files += 1
- print(("In music directory, processed {} files: {} had missing wav data".format(num_good_files, num_bad_files))
- return utt2spk_str, utt2wav_str
-
-def prepare_speech(root_dir):
- utt2spk = {}
- utt2wav = {}
- num_good_files = 0
- num_bad_files = 0
- speech_dir = os.path.join(root_dir, "speech")
- for root, dirs, files in os.walk(speech_dir):
- for file in files:
- file_path = os.path.join(root, file)
- if file.endswith(".wav"):
- utt = str(file).replace(".wav", "")
- utt2wav[utt] = file_path
- utt2spk[utt] = utt
- utt2spk_str = ""
- utt2wav_str = ""
- for utt in utt2spk:
- if utt in utt2wav:
- utt2spk_str = utt2spk_str + utt + " " + utt2spk[utt] + "\n"
- utt2wav_str = utt2wav_str + utt + " " + utt2wav[utt] + "\n"
- num_good_files += 1
- else:
- print("Missing file {}".format(utt))
- num_bad_files += 1
- print(("In speech directory, processed {} files: {} had missing wav data".format(num_good_files, num_bad_files))
- return utt2spk_str, utt2wav_str
-
-def prepare_noise(root_dir):
- utt2spk = {}
- utt2wav = {}
- num_good_files = 0
- num_bad_files = 0
- noise_dir = os.path.join(root_dir, "noise")
- for root, dirs, files in os.walk(noise_dir):
- for file in files:
- file_path = os.path.join(root, file)
- if file.endswith(".wav"):
- utt = str(file).replace(".wav", "")
- utt2wav[utt] = file_path
- utt2spk[utt] = utt
- utt2spk_str = ""
- utt2wav_str = ""
- for utt in utt2spk:
- if utt in utt2wav:
- utt2spk_str = utt2spk_str + utt + " " + utt2spk[utt] + "\n"
- utt2wav_str = utt2wav_str + utt + " " + utt2wav[utt] + "\n"
- num_good_files += 1
- else:
- print("Missing file {}".format(utt))
- num_bad_files += 1
- print(("In noise directory, processed {} files: {} had missing wav data".format(num_good_files, num_bad_files))
- return utt2spk_str, utt2wav_str
-
-def main():
- in_dir = sys.argv[1]
- out_dir = sys.argv[2]
- use_vocals = sys.argv[3] == "Y"
- utt2spk_music, utt2wav_music = prepare_music(in_dir, use_vocals)
- utt2spk_speech, utt2wav_speech = prepare_speech(in_dir)
- utt2spk_noise, utt2wav_noise = prepare_noise(in_dir)
- utt2spk = utt2spk_speech + utt2spk_music + utt2spk_noise
- utt2wav = utt2wav_speech + utt2wav_music + utt2wav_noise
- wav_fi = open(os.path.join(out_dir, "wav.scp"), 'w')
- wav_fi.write(utt2wav)
- utt2spk_fi = open(os.path.join(out_dir, "utt2spk"), 'w')
- utt2spk_fi.write(utt2spk)
-
-
-if __name__=="__main__":
- main()
diff --git a/egs/bn_music_speech/v1/local/make_musan.sh b/egs/bn_music_speech/v1/local/make_musan.sh
deleted file mode 100755
index 694940ad70f..00000000000
--- a/egs/bn_music_speech/v1/local/make_musan.sh
+++ /dev/null
@@ -1,37 +0,0 @@
-#!/bin/bash
-# Copyright 2015 David Snyder
-# Apache 2.0.
-#
-# This script, called by ../run.sh, creates the MUSAN
-# data directory. The required dataset is freely available at
-# http://www.openslr.org/17/
-
-set -e
-in_dir=$1
-data_dir=$2
-use_vocals='Y'
-
-mkdir -p local/musan.tmp
-
-echo "Preparing ${data_dir}/musan..."
-mkdir -p ${data_dir}/musan
-local/make_musan.py ${in_dir} ${data_dir}/musan ${use_vocals}
-
-utils/fix_data_dir.sh ${data_dir}/musan
-
-grep "music" ${data_dir}/musan/utt2spk > local/musan.tmp/utt2spk_music
-grep "speech" ${data_dir}/musan/utt2spk > local/musan.tmp/utt2spk_speech
-grep "noise" ${data_dir}/musan/utt2spk > local/musan.tmp/utt2spk_noise
-utils/subset_data_dir.sh --utt-list local/musan.tmp/utt2spk_music \
- ${data_dir}/musan ${data_dir}/musan_music
-utils/subset_data_dir.sh --utt-list local/musan.tmp/utt2spk_speech \
- ${data_dir}/musan ${data_dir}/musan_speech
-utils/subset_data_dir.sh --utt-list local/musan.tmp/utt2spk_noise \
- ${data_dir}/musan ${data_dir}/musan_noise
-
-utils/fix_data_dir.sh ${data_dir}/musan_music
-utils/fix_data_dir.sh ${data_dir}/musan_speech
-utils/fix_data_dir.sh ${data_dir}/musan_noise
-
-rm -rf local/musan.tmp
-
diff --git a/egs/bn_music_speech/v1/run.sh b/egs/bn_music_speech/v1/run.sh
index 6cc0531e9d7..08d5c022a9d 100755
--- a/egs/bn_music_speech/v1/run.sh
+++ b/egs/bn_music_speech/v1/run.sh
@@ -20,7 +20,7 @@ vaddir=`pwd`/mfcc
local/make_bn.sh /export/corpora5/LDC/LDC97S44 \
/export/corpora/LDC/LDC97T22 data
-local/make_musan.sh /export/corpora/JHU/musan data
+steps/data/make_musan.sh --sampling-rate 16000 /export/corpora/JHU/musan data
steps/make_mfcc.sh --mfcc-config conf/mfcc.conf --nj 30 --cmd "$train_cmd" \
data/musan_speech exp/make_mfcc $mfccdir
diff --git a/egs/callhome_diarization/v1/diarization/VB_diarization.py b/egs/callhome_diarization/v1/diarization/VB_diarization.py
new file mode 100644
index 00000000000..31af078efd2
--- /dev/null
+++ b/egs/callhome_diarization/v1/diarization/VB_diarization.py
@@ -0,0 +1,359 @@
+# Copyright 2013-2017 Lukas Burget (burget@fit.vutbr.cz)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#
+# Revision History
+# L. Burget 16/07/13 01:00AM - original version
+# L. Burget 20/06/17 12:07AM - np.asarray replaced by .toarray()
+# - minor bug fix in initializing q
+# - minor bug fix in ELBO calculation
+# - few more optimizations
+
+import numpy as np
+from scipy.sparse import coo_matrix
+import scipy.linalg as spl
+import numexpr as ne # the dependency on this modul can be avoided by replacing
+ # logsumexp_ne and exp_ne with logsumexp and np.exp
+
+#[q sp Li] =
+def VB_diarization(X, m, iE, w, V, sp=None, q=None,
+ maxSpeakers = 10, maxIters = 10,
+ epsilon = 1e-4, loopProb = 0.99, statScale = 1.0,
+ alphaQInit = 1.0, downsample = None, VtiEV = None, ref=None,
+ plot=False, sparsityThr=0.001, llScale=1.0, minDur=1):
+
+ """
+ This a generalized version of speaker diarization described in:
+
+ Kenny, P. Bayesian Analysis of Speaker Diarization with Eigenvoice Priors,
+ Montreal, CRIM, May 2008.
+
+ Kenny, P., Reynolds, D., and Castaldo, F. Diarization of Telephone
+ Conversations using Factor Analysis IEEE Journal of Selected Topics in Signal
+ Processing, December 2010.
+
+ The generalization introduced in this implementation lies in using an HMM
+ instead of the simple mixture model when modeling generation of segments
+ (or even frames) from speakers. HMM limits the probability of switching
+ between speakers when changing frames, which makes it possible to use
+ the model on frame-by-frame bases without any need to iterate between
+ 1) clustering speech segments and 2) re-segmentation (i.e. as it was done in
+ the paper above).
+
+ Inputs:
+ X - T x D array, where columns are D dimensional feature vectors for T frames
+ m - C x D array of GMM component means
+ iE - C x D array of GMM component inverse covariance matrix diagonals
+ w - C dimensional column vector of GMM component weights
+ V - R x C x D array of eigenvoices
+ maxSpeakers - maximum number of speakers expected in the utterance
+ maxIters - maximum number of algorithm iterations
+ epsilon - stop iterating, if obj. fun. improvement is less than epsilon
+ loopProb - probability of not switching speakers between frames
+ statScale - scale sufficient statiscits collected using UBM
+ llScale - scale UBM likelihood (i.e. llScale < 1.0 make atribution of
+ frames to UBM componets more uncertain)
+ sparsityThr - set occupations smaller that this threshold to 0.0 (saves memory
+ as the posteriors are represented by sparse matrix)
+ alphaQInit - Dirichlet concentraion parameter for initializing q
+ downsample - perform diarization on input downsampled by this factor
+ VtiEV - C x (R**2+R)/2 matrix normally calculated by VB_diarization when
+ VtiEV is None. However, it can be pre-calculated using function
+ precalculate_VtiEV(V) and used across calls of VB_diarization.
+ minDur - minimum number of frames between speaker turns imposed by linear
+ chains of HMM states corresponding to each speaker. All the states
+ in a chain share the same output distribution
+ ref - T dim. integer vector with reference speaker ID (0:maxSpeakers)
+ per frame
+ plot - if set to True, plot per-frame speaker posteriors.
+
+ Outputs:
+ q - S x T matrix of posteriors attribution each frame to one of S possible
+ speakers, where S is given by opts.maxSpeakers
+ sp - S dimensional column vector of ML learned speaker priors. Ideally, these
+ should allow to estimate # of speaker in the utterance as the
+ probabilities of the redundant speaker should converge to zero.
+ Li - values of auxiliary function (and DER and frame cross-entropy between q
+ and reference if 'ref' is provided) over iterations.
+ """
+
+ # The references to equations corresponds to the technical report:
+ # Kenny, P. Bayesian Analysis of Speaker Diarization with Eigenvoice Priors,
+ # Montreal, CRIM, May 2008.
+
+ D=X.shape[1] # feature dimensionality
+ C=len(w) # number of mixture components
+ R=V.shape[0] # subspace rank
+ nframes=X.shape[0]
+
+ if VtiEV is None:
+ VtiEV = precalculate_VtiEV(V, iE)
+
+ V = V.reshape(V.shape[0],-1)
+
+ if sp is None:
+ sp = np.ones(maxSpeakers)/maxSpeakers
+ else:
+ maxSpeakers = len(sp)
+
+ if q is None:
+ # initialize q from flat Dirichlet prior with concentrsaion parameter alphaQInit
+ q = np.random.gamma(alphaQInit, size=(nframes, maxSpeakers))
+ q = q / q.sum(1, keepdims=True)
+
+ # calculate UBM mixture frame posteriors (i.e. per-frame zero order statistics)
+ ll = (X**2).dot(-0.5*iE.T) + X.dot(iE.T*m.T)-0.5*((iE * m**2 - np.log(iE)).sum(1) - 2*np.log(w) + D*np.log(2*np.pi))
+ ll *= llScale
+ G = logsumexp_ne(ll, axis=1)
+ NN = exp_ne(ll - G[:,np.newaxis]) * statScale
+ NN[NN 0 and L - Li[-2][0] < epsilon:
+ if L - Li[-1][0] < 0: print('WARNING: Value of auxiliary function has decreased!')
+ break
+
+ if downsample is not None:
+ #upsample resulting q to match number of frames in the input utterance
+ q = downsampler.T.dot(q)
+
+ return q, sp, Li
+
+
+def precalculate_VtiEV(V, iE):
+ tril_ind = np.tril_indices(V.shape[0])
+ VtiEV = np.empty((V.shape[1],len(tril_ind[0])), V.dtype)
+ for c in range(V.shape[1]):
+ VtiEV[c,:] = np.dot(V[:,c,:]*iE[np.newaxis,c,:], V[:,c,:].T)[tril_ind]
+ return VtiEV
+
+
+# Initialize q (per-frame speaker posteriors) from a reference
+# (vector of per-frame zero based integer speaker IDs)
+def frame_labels2posterior_mx(labels, maxSpeakers):
+ #initialize from reference
+ #pmx = np.zeros((len(labels), labels.max()+1))
+ pmx = np.zeros((len(labels), maxSpeakers))
+ pmx[np.arange(len(labels)), labels] = 1
+ return pmx
+
+# Calculates Diarization Error Rate (DER) or per-frame cross-entropy between
+# reference (vector of per-frame zero based integer speaker IDs) and q (per-frame
+# speaker posteriors). If expected=False, q is converted into hard labels before
+# calculating DER. If expected=TRUE, posteriors in q are used to calculated
+# "expected" DER.
+def DER(q, ref, expected=True, xentropy=False):
+ from itertools import permutations
+
+ if not expected:
+ # replce probabiities in q by zeros and ones
+ hard_labels = q.argmax(1)
+ q = np.zeros_like(q)
+ q[range(len(q)), hard_labels] = 1
+
+ err_mx = np.empty((ref.max()+1, q.shape[1]))
+ for s in range(err_mx.shape[0]):
+ tmpq = q[ref == s,:]
+ err_mx[s] = (-np.log(tmpq) if xentropy else tmpq).sum(0)
+
+ if err_mx.shape[0] < err_mx.shape[1]:
+ err_mx = err_mx.T
+
+ # try all alignments (permutations) of reference and detected speaker
+ #could be written in more efficient way using dynamic programing
+ acc = [err_mx[perm[:err_mx.shape[1]], range(err_mx.shape[1])].sum()
+ for perm in permutations(range(err_mx.shape[0]))]
+ if xentropy:
+ return min(acc)/float(len(ref))
+ else:
+ return (len(ref) - max(acc))/float(len(ref))
+
+
+###############################################################################
+# Module private functions
+###############################################################################
+def logsumexp(x, axis=0):
+ xmax = x.max(axis)
+ x = xmax + np.log(np.sum(np.exp(x - np.expand_dims(xmax, axis)), axis))
+ infs = np.isinf(xmax)
+ if np.ndim(x) > 0:
+ x[infs] = xmax[infs]
+ elif infs:
+ x = xmax
+ return x
+
+
+# The folowing two functions are only versions optimized for speed using numexpr
+# module and can be replaced by logsumexp and np.exp functions to avoid
+# the dependency on the module.
+def logsumexp_ne(x, axis=0):
+ xmax = np.array(x).max(axis=axis)
+ xmax_e = np.expand_dims(xmax, axis)
+ x = ne.evaluate("sum(exp(x - xmax_e), axis=%d)" % axis)
+ x = ne.evaluate("xmax + log(x)")
+ infs = np.isinf(xmax)
+ if np.ndim(x) > 0:
+ x[infs] = xmax[infs]
+ elif infs:
+ x = xmax
+ return x
+
+
+def exp_ne(x, out=None):
+ return ne.evaluate("exp(x)", out=None)
+
+
+# Convert vector with lower-triangular coefficients into symetric matrix
+def tril_to_sym(tril):
+ R = np.sqrt(len(tril)*2).astype(int)
+ tril_ind = np.tril_indices(R)
+ S = np.empty((R,R))
+ S[tril_ind] = tril
+ S[tril_ind[::-1]] = tril
+ return S
+
+
+def logdet(A):
+ return 2*np.sum(np.log(np.diag(spl.cholesky(A))))
+
+
+def forward_backward(lls, tr, ip):
+ """
+ Inputs:
+ lls - matrix of per-frame log HMM state output probabilities
+ tr - transition probability matrix
+ ip - vector of initial state probabilities (i.e. statrting in the state)
+ Outputs:
+ sp - matrix of per-frame state occupation posteriors
+ tll - total (forward) log-likelihood
+ lfw - log forward probabilities
+ lfw - log backward probabilities
+ """
+ ltr = np.log(tr)
+ lfw = np.empty_like(lls)
+ lbw = np.empty_like(lls)
+ lfw[:] = -np.inf
+ lbw[:] = -np.inf
+ lfw[0] = lls[0] + np.log(ip)
+ lbw[-1] = 0.0
+
+ for ii in xrange(1,len(lls)):
+ lfw[ii] = lls[ii] + logsumexp(lfw[ii-1] + ltr.T, axis=1)
+
+ for ii in reversed(xrange(len(lls)-1)):
+ lbw[ii] = logsumexp(ltr + lls[ii+1] + lbw[ii+1], axis=1)
+
+ tll = logsumexp(lfw[-1])
+ sp = np.exp(lfw + lbw - tll)
+ return sp, tll, lfw, lbw
diff --git a/egs/callhome_diarization/v1/diarization/VB_resegmentation.py b/egs/callhome_diarization/v1/diarization/VB_resegmentation.py
new file mode 100755
index 00000000000..aa951693615
--- /dev/null
+++ b/egs/callhome_diarization/v1/diarization/VB_resegmentation.py
@@ -0,0 +1,281 @@
+#!/usr/bin/env python
+
+import numpy as np
+import VB_diarization
+import pickle
+import kaldi_io
+import sys
+import argparse
+import commands
+
+def get_utt_list(utt2spk_filename):
+ utt_list = []
+ with open(utt2spk_filename, 'r') as fh:
+ content = fh.readlines()
+ for line in content:
+ line = line.strip('\n')
+ line_split = line.split()
+ utt_list.append(line_split[0])
+ print("{} UTTERANCES IN TOTAL".format(len(utt_list)))
+ return utt_list
+
+def utt_num_frames_mapping(utt2num_frames_filename):
+ utt2num_frames = {}
+ with open(utt2num_frames_filename, 'r') as fh:
+ content = fh.readlines()
+ for line in content:
+ line = line.strip('\n')
+ line_split = line.split()
+ utt2num_frames[line_split[0]] = int(line_split[1])
+ return utt2num_frames
+
+def create_ref_file(uttname, utt2num_frames, full_rttm_filename, temp_dir, rttm_filename):
+ utt_rttm_file = open("{}/{}".format(temp_dir, rttm_filename), 'w')
+
+ num_frames = utt2num_frames[uttname]
+
+ # We use 0 to denote silence frames and 1 to denote overlapping frames.
+ ref = np.zeros(num_frames)
+ speaker_dict = {}
+ num_spk = 0
+
+ with open(full_rttm_filename, 'r') as fh:
+ content = fh.readlines()
+ for line in content:
+ line = line.strip('\n')
+ line_split = line.split()
+ uttname_line = line_split[1]
+ if uttname != uttname_line:
+ continue
+ else:
+ utt_rttm_file.write(line + "\n")
+ start_time = int(float(line_split[3]) * 100)
+ duration_time = int(float(line_split[4]) * 100)
+ end_time = start_time + duration_time
+ spkname = line_split[7]
+ if spkname not in speaker_dict.keys():
+ spk_idx = num_spk + 2
+ speaker_dict[spkname] = spk_idx
+ num_spk += 1
+
+ for i in range(start_time, end_time):
+ if i < 0:
+ raise ValueError(line)
+ elif i >= num_frames:
+ print("{} EXCEED NUM_FRAMES".format(line))
+ break
+ else:
+ if ref[i] == 0:
+ ref[i] = speaker_dict[spkname]
+ else:
+ ref[i] = 1 # The overlapping speech is marked as 1.
+ ref = ref.astype(int)
+
+ print("{} SPEAKERS IN {}".format(num_spk, uttname))
+ print("{} TOTAL, {} SILENCE({:.0f}%), {} OVERLAPPING({:.0f}%)".format(len(ref), np.sum(ref == 0), 100.0 * np.sum(ref == 0) / len(ref), np.sum(ref == 1), 100.0 * np.sum(ref == 1) / len(ref)))
+
+ duration_list = []
+ for i in range(num_spk):
+ duration_list.append(1.0 * np.sum(ref == (i + 2)) / len(ref))
+ duration_list.sort()
+ duration_list = map(lambda x: '{0:.2f}'.format(x), duration_list)
+ print("DISTRIBUTION OF SPEAKER {}".format(" ".join(duration_list)))
+ print("")
+ sys.stdout.flush()
+ utt_rttm_file.close()
+ return ref
+
+def create_rttm_output(uttname, predicted_label, output_dir, channel):
+ num_frames = len(predicted_label)
+
+ start_idx = 0
+ idx_list = []
+
+ last_label = predicted_label[0]
+ for i in range(num_frames):
+ if predicted_label[i] == last_label: # The speaker label remains the same.
+ continue
+ else: # The speaker label is different.
+ if last_label != 0: # Ignore the silence.
+ idx_list.append([start_idx, i, last_label])
+ start_idx = i
+ last_label = predicted_label[i]
+ if last_label != 0:
+ idx_list.append([start_idx, num_frames, last_label])
+
+ with open("{}/{}_predict.rttm".format(output_dir, uttname), 'w') as fh:
+ for i in range(len(idx_list)):
+ start_frame = (idx_list[i])[0]
+ end_frame = (idx_list[i])[1]
+ label = (idx_list[i])[2]
+ duration = end_frame - start_frame
+ fh.write("SPEAKER {} {} {:.2f} {:.2f} {} \n".format(uttname, channel, start_frame / 100.0, duration / 100.0, label))
+ return 0
+
+def match_DER(string):
+ string_split = string.split('\n')
+ for line in string_split:
+ if "OVERALL SPEAKER DIARIZATION ERROR" in line:
+ return line
+ return 0
+
+def main():
+ parser = argparse.ArgumentParser(description='VB Resegmentation')
+ parser.add_argument('data_dir', type=str, help='Subset data directory')
+ parser.add_argument('init_rttm_filename', type=str,
+ help='The rttm file to initialize the VB system, usually the AHC cluster result')
+ parser.add_argument('output_dir', type=str, help='Output directory')
+ parser.add_argument('dubm_model', type=str, help='Path of the diagonal UBM model')
+ parser.add_argument('ie_model', type=str, help='Path of the ivector extractor model')
+ parser.add_argument('--max-speakers', type=int, default=10,
+ help='Maximum number of speakers expected in the utterance (default: 10)')
+ parser.add_argument('--max-iters', type=int, default=10,
+ help='Maximum number of algorithm iterations (default: 10)')
+ parser.add_argument('--downsample', type=int, default=25,
+ help='Perform diarization on input downsampled by this factor (default: 25)')
+ parser.add_argument('--alphaQInit', type=float, default=100.0,
+ help='Dirichlet concentraion parameter for initializing q')
+ parser.add_argument('--sparsityThr', type=float, default=0.001,
+ help='Set occupations smaller that this threshold to 0.0 (saves memory as \
+ the posteriors are represented by sparse matrix)')
+ parser.add_argument('--epsilon', type=float, default=1e-6,
+ help='Stop iterating, if obj. fun. improvement is less than epsilon')
+ parser.add_argument('--minDur', type=int, default=1,
+ help='Minimum number of frames between speaker turns imposed by linear \
+ chains of HMM states corresponding to each speaker. All the states \
+ in a chain share the same output distribution')
+ parser.add_argument('--loopProb', type=float, default=0.9,
+ help='Probability of not switching speakers between frames')
+ parser.add_argument('--statScale', type=float, default=0.2,
+ help='Scale sufficient statiscits collected using UBM')
+ parser.add_argument('--llScale', type=float, default=1.0,
+ help='Scale UBM likelihood (i.e. llScale < 1.0 make atribution of \
+ frames to UBM componets more uncertain)')
+ parser.add_argument('--channel', type=int, default=0,
+ help='Channel information in the rttm file')
+ parser.add_argument('--initialize', type=int, default=1,
+ help='Whether to initalize the speaker posterior')
+
+ args = parser.parse_args()
+ print(args)
+ data_dir = args.data_dir
+ init_rttm_filename = args.init_rttm_filename
+
+ # The data directory should contain wav.scp, spk2utt, utt2spk and feats.scp
+ utt2spk_filename = "{}/utt2spk".format(data_dir)
+ utt2num_frames_filename = "{}/utt2num_frames".format(data_dir)
+ feats_scp_filename = "{}/feats.scp".format(data_dir)
+ temp_dir = "{}/tmp".format(args.output_dir)
+ rttm_dir = "{}/rttm".format(args.output_dir)
+
+ utt_list = get_utt_list(utt2spk_filename)
+ utt2num_frames = utt_num_frames_mapping(utt2num_frames_filename)
+ print("------------------------------------------------------------------------")
+ print("")
+ sys.stdout.flush()
+
+ # Load the diagonal UBM and i-vector extractor
+ with open(args.dubm_model, 'rb') as fh:
+ dubm_para = pickle.load(fh)
+ with open(args.ie_model, 'rb') as fh:
+ ie_para = pickle.load(fh)
+
+ DUBM_WEIGHTS = None
+ DUBM_MEANS_INVVARS = None
+ DUBM_INV_VARS = None
+ IE_M = None
+
+ for key in dubm_para.keys():
+ if key == "":
+ DUBM_WEIGHTS = dubm_para[key]
+ elif key == "":
+ DUBM_MEANS_INVVARS = dubm_para[key]
+ elif key == "":
+ DUBM_INV_VARS = dubm_para[key]
+ else:
+ continue
+
+ for key in ie_para.keys():
+ if key == "M":
+ IE_M = np.transpose(ie_para[key], (2, 0, 1))
+ m = DUBM_MEANS_INVVARS / DUBM_INV_VARS
+ iE = DUBM_INV_VARS
+ w = DUBM_WEIGHTS
+ V = IE_M
+
+ # Load the MFCC features
+ feats_dict = {}
+ for key,mat in kaldi_io.read_mat_scp(feats_scp_filename):
+ feats_dict[key] = mat
+
+ for utt in utt_list:
+ # Get the alignments from the clustering result.
+ # In init_ref, 0 denotes the silence silence frames
+ # 1 denotes the overlapping speech frames, the speaker
+ # label starts from 2.
+ init_ref = create_ref_file(utt, utt2num_frames, init_rttm_filename, temp_dir, "{}.rttm".format(utt))
+ # Ground truth of the diarization.
+
+ X = feats_dict[utt]
+ X = X.astype(np.float64)
+
+ # Keep only the voiced frames (0 denotes the silence
+ # frames, 1 denotes the overlapping speech frames). Since
+ # our method predicts single speaker label for each frame
+ # the init_ref doesn't contain 1.
+ mask = (init_ref >= 2)
+ X_voiced = X[mask]
+ init_ref_voiced = init_ref[mask] - 2
+
+ if X_voiced.shape[0] == 0:
+ print("Warning: {} has no voiced frames in the initialization file".format(utt))
+ continue
+
+ # Initialize the posterior of each speaker based on the clustering result.
+ if args.initialize:
+ q = VB_diarization.frame_labels2posterior_mx(init_ref_voiced, args.max_speakers)
+ else:
+ q = None
+ print("RANDOM INITIALIZATION\n")
+
+ # VB resegmentation
+
+ # q - S x T matrix of posteriors attribution each frame to one of S possible
+ # speakers, where S is given by opts.maxSpeakers
+ # sp - S dimensional column vector of ML learned speaker priors. Ideally, these
+ # should allow to estimate # of speaker in the utterance as the
+ # probabilities of the redundant speaker should converge to zero.
+ # Li - values of auxiliary function (and DER and frame cross-entropy between q
+ # and reference if 'ref' is provided) over iterations.
+ q_out, sp_out, L_out = VB_diarization.VB_diarization(X_voiced, m, iE, w, V, sp=None, q=q, maxSpeakers=args.max_speakers, maxIters=args.max_iters, VtiEV=None,
+ downsample=args.downsample, alphaQInit=args.alphaQInit, sparsityThr=args.sparsityThr, epsilon=args.epsilon, minDur=args.minDur,
+ loopProb=args.loopProb, statScale=args.statScale, llScale=args.llScale, ref=None, plot=False)
+
+ predicted_label_voiced = np.argmax(q_out, 1) + 2
+ predicted_label = (np.zeros(len(mask))).astype(int)
+ predicted_label[mask] = predicted_label_voiced
+
+ duration_list = []
+ for i in range(args.max_speakers):
+ num_frames = np.sum(predicted_label == (i + 2))
+ if num_frames == 0:
+ continue
+ else:
+ duration_list.append(1.0 * num_frames / len(predicted_label))
+ duration_list.sort()
+ duration_list = map(lambda x: '{0:.2f}'.format(x), duration_list)
+ print("PREDICTED {} SPEAKERS".format(len(duration_list)))
+ print("DISTRIBUTION {}".format(" ".join(duration_list)))
+ print("sp_out", sp_out)
+ print("L_out", L_out)
+
+ # Create the output rttm file and compute the DER after re-segmentation
+ create_rttm_output(utt, predicted_label, rttm_dir, args.channel)
+ print("")
+ print("------------------------------------------------------------------------")
+ print("")
+ sys.stdout.flush()
+ return 0
+
+if __name__ == "__main__":
+ main()
diff --git a/egs/callhome_diarization/v1/diarization/VB_resegmentation.sh b/egs/callhome_diarization/v1/diarization/VB_resegmentation.sh
new file mode 100755
index 00000000000..a677f178ee5
--- /dev/null
+++ b/egs/callhome_diarization/v1/diarization/VB_resegmentation.sh
@@ -0,0 +1,103 @@
+#!/bin/bash
+
+# Begin configuration section.
+nj=20
+cmd=run.pl
+stage=0
+true_rttm_filename=None
+max_speakers=10
+max_iters=10
+downsample=25
+alphaQInit=100.0
+sparsityThr=0.001
+epsilon=1e-6
+minDur=1
+loopProb=0.9
+statScale=0.2
+llScale=1.0
+channel=0
+initialize=1
+# End configuration section.
+
+echo "$0 $@" # Print the command line for logging
+
+if [ -f path.sh ]; then . ./path.sh; fi
+. parse_options.sh || exit 1;
+
+if [ -f $KALDI_ROOT/tools/VB_diarization/VB_diarization.py ]; then
+ echo "VB_diarization is installed so will use the script"
+else
+ echo "VB_diarization is not installed, Please install
+ it using extras/install_diarization_VBHMM.sh in tools/"
+ exit 1;
+fi
+
+
+if [ $# != 5 ]; then
+ echo "Usage: local/VB_resegmentation.sh "
+ echo "Variational Bayes Re-segmenatation"
+ echo "Options: "
+ echo " --cmd (utils/run.pl|utils/queue.pl ) # How to run jobs."
+ echo " --nj # Number of parallel jobs to run."
+ echo " --true-rttm-filename # The true rttm label file"
+ echo " --max-speakers # Maximum number of speakers"
+ echo " # expected in the utterance"
+ echo " # (default: 10)"
+ echo " --max-iters # Maximum number of algorithm"
+ echo " # iterations (default: 10)"
+ echo " --downsample # Perform diarization on input"
+ echo " # downsampled by this factor"
+ echo " # (default: 25)"
+ echo " --alphaQInit # Dirichlet concentraion"
+ echo " # parameter for initializing q"
+ echo " --sparsityThr # Set occupations smaller that"
+ echo " # this threshold to 0.0 (saves"
+ echo " # memory as the posteriors are"
+ echo " # represented by sparse matrix)"
+ echo " --epsilon # Stop iterating, if obj. fun."
+ echo " # improvement is less than"
+ echo " # epsilon"
+ echo " --minDur # Minimum number of frames"
+ echo " # between speaker turns imposed"
+ echo " # by linear chains of HMM"
+ echo " # state corresponding to each"
+ echo " # speaker. All the states in"
+ echo " # a chain share the same output"
+ echo " # distribution"
+ echo " --loopProb # Probability of not switching"
+ echo " # speakers between frames"
+ echo " --statScale # Scale sufficient statistics"
+ echo " # collected using UBM"
+ echo " --llScale # Scale UBM likelihood (i.e."
+ echo " # llScale < 1.0 make"
+ echo " # attribution of frames to UBM"
+ echo " # componets more uncertain)"
+ echo " --channel # Channel information in the rttm file"
+ echo " --initialize # Whether to initalize the"
+ echo " # speaker posterior (if not)"
+ echo " # the speaker posterior will be"
+ echo " # randomly initilized"
+
+ exit 1;
+fi
+
+data_dir=$1
+init_rttm_filename=$2
+output_dir=$3
+dubm_model=$4
+ie_model=$5
+
+mkdir -p $output_dir/rttm
+
+sdata=$data_dir/split$nj;
+utils/split_data.sh $data_dir $nj || exit 1;
+
+if [ $stage -le 0 ]; then
+ $cmd JOB=1:$nj $output_dir/log/VB_resegmentation.JOB.log \
+ diarization/VB_resegmentation.py --true-rttm-filename $true_rttm_filename --max-speakers $max_speakers \
+ --max-iters $max_iters --downsample $downsample --alphaQInit $alphaQInit \
+ --sparsityThr $sparsityThr --epsilon $epsilon --minDur $minDur \
+ --loopProb $loopProb --statScale $statScale --llScale $llScale \
+ --channel $channel --initialize $initialize \
+ $sdata/JOB $init_rttm_filename $output_dir $dubm_model $ie_model || exit 1;
+fi
diff --git a/egs/callhome_diarization/v1/diarization/cluster.sh b/egs/callhome_diarization/v1/diarization/cluster.sh
index fa5ead5b6b9..5e5c6e9dbe5 100755
--- a/egs/callhome_diarization/v1/diarization/cluster.sh
+++ b/egs/callhome_diarization/v1/diarization/cluster.sh
@@ -14,6 +14,8 @@ stage=0
nj=10
cleanup=true
threshold=0.5
+max_spk_fraction=1.0
+first_pass_max_utterances=32767
rttm_channel=0
read_costs=false
reco2num_spk=
@@ -36,6 +38,15 @@ if [ $# != 2 ]; then
echo " --threshold # Cluster stopping criterion. Clusters with scores greater"
echo " # than this value will be merged until all clusters"
echo " # exceed this value."
+ echo " --max-spk-fraction # Clusters with total fraction of utterances greater than"
+ echo " # this value will not be merged. This is active only when"
+ echo " # reco2num-spk is supplied and"
+ echo " # 1.0 / num-spk <= max-spk-fraction <= 1.0."
+ echo " --first-pass-max-utterances # If the number of utterances is larger than first-pass-max-utterances,"
+ echo " # then clustering is done in two passes. In the first pass, input points"
+ echo " # are divided into contiguous subsets of size first-pass-max-utterances"
+ echo " # and each subset is clustered separately. In the second pass, the first"
+ echo " # pass clusters are merged into the final set of clusters."
echo " --rttm-channel # The value passed into the RTTM channel field. Only affects"
echo " # the format of the RTTM file."
echo " --read-costs # If true, interpret input scores as costs, i.e. similarity"
@@ -78,8 +89,10 @@ if [ $stage -le 0 ]; then
echo "$0: clustering scores"
$cmd JOB=1:$nj $dir/log/agglomerative_cluster.JOB.log \
agglomerative-cluster --threshold=$threshold --read-costs=$read_costs \
- --reco2num-spk-rspecifier=$reco2num_spk scp:"$feats" \
- ark,t:$sdata/JOB/spk2utt ark,t:$dir/labels.JOB || exit 1;
+ --reco2num-spk-rspecifier=$reco2num_spk \
+ --max-spk-fraction=$max_spk_fraction \
+ --first-pass-max-utterances=$first_pass_max_utterances \
+ scp:"$feats" ark,t:$sdata/JOB/spk2utt ark,t:$dir/labels.JOB || exit 1;
fi
if [ $stage -le 1 ]; then
diff --git a/egs/callhome_diarization/v1/diarization/dump_model.py b/egs/callhome_diarization/v1/diarization/dump_model.py
new file mode 100755
index 00000000000..47a85b114d3
--- /dev/null
+++ b/egs/callhome_diarization/v1/diarization/dump_model.py
@@ -0,0 +1,127 @@
+#!/usr/bin/env python
+
+import numpy as np
+import pickle
+import sys
+
+def load_dubm(dubm_text):
+ para_dict = {}
+ with open(dubm_text, 'r') as fh:
+ content = fh.readlines()
+ state = 0
+ data_array = []
+
+ for line in content:
+ line = line.strip('\n')
+ line_split = line.split()
+ if state == 0:
+ if len(line_split) == 1:
+ continue
+ elif len(line_split) == 2 and line_split[1] == "[":
+ para_name = line_split[0]
+ state = 1
+ data_array = []
+ elif len(line_split) >= 3 and line_split[1] == "[" and line_split[-1] == "]": # One line vector
+ para_name = line_split[0]
+ data_list = []
+ for i in range(2, len(line_split) - 1):
+ data_list.append(float(line_split[i]))
+ data_list = np.array(data_list)
+ para_dict[para_name] = data_list
+ else:
+ raise ValueError("Condition not defined.")
+ elif state == 1:
+ if line_split[-1] == "]":
+ data_list = []
+ for i in range(len(line_split) - 1):
+ data_list.append(float(line_split[i]))
+ data_list = np.array(data_list)
+ data_array.append(data_list)
+ data_array = np.array(data_array)
+ para_dict[para_name] = data_array
+ state = 0
+ else:
+ data_list = []
+ for i in range(len(line_split)):
+ data_list.append(float(line_split[i]))
+ data_list = np.array(data_list)
+ data_array.append(data_list)
+ else:
+ raise ValueError("Condition not defined.")
+ return para_dict
+
+def load_ivector_extractor(ie_text):
+ para_dict = {}
+ with open(ie_text, 'r') as fh:
+ content = fh.readlines()
+ state = 0
+ data_3dmatrix = []
+ data_matrix = []
+ data_array = []
+
+ for line in content:
+ line = line.strip('\n')
+ if line == " [":
+ break
+ if state == 0:
+ if line != " 1024 [":
+ continue
+ else:
+ state = 1
+ elif state == 1:
+ line_split = line.split()
+ if line_split[0] == "[":
+ continue
+ elif line_split[-1] == "]":
+ data_array = []
+ for i in range(len(line_split)-1):
+ data_array.append(float(line_split[i]))
+ data_matrix.append(data_array)
+ data_3dmatrix.append(data_matrix)
+ data_matrix = []
+ else:
+ data_array = []
+ for i in range(len(line_split)):
+ data_array.append(float(line_split[i]))
+ data_matrix.append(data_array)
+ else:
+ raise ValueError("Condition not defined.")
+ para_dict['M'] = np.array(data_3dmatrix)
+ return para_dict
+
+def save_dict(para_dict, output_filename):
+ with open(output_filename, 'wb') as fh:
+ pickle.dump(para_dict, fh)
+ return 0
+
+def judge_case(txt_model):
+ with open(txt_model, 'r') as fh:
+ first_line = fh.readline()
+ model_type = first_line.split()[0]
+ if model_type == "":
+ return 1
+ elif model_type == "":
+ return 2
+ else:
+ return 0
+
+def main():
+ # The txt version of diagonal UBM and i-vector extractor. See gmm-global-copy
+ # and ivector-extractor-copy for details. (ivector-extractor-copy is not
+ # supported in the official kaldi, so you have to use my kaldi)
+ txt_model = sys.argv[1]
+ output_dir = sys.argv[2]
+ model_type = judge_case(txt_model)
+
+ if model_type == 1: # DiagGMM
+ dubm_para = load_dubm(txt_model)
+ save_dict(dubm_para, "{}/diag_ubm.pkl".format(output_dir))
+ elif model_type == 2: # IvectorExtractor
+ ie_para = load_ivector_extractor(txt_model)
+ save_dict(ie_para, "{}/ie.pkl".format(output_dir))
+ else:
+ raise ValueError("Condition not defined.")
+ return 0
+
+if __name__ == "__main__":
+ main()
diff --git a/egs/callhome_diarization/v1/diarization/kaldi_io.py b/egs/callhome_diarization/v1/diarization/kaldi_io.py
new file mode 100755
index 00000000000..dae5599b8f1
--- /dev/null
+++ b/egs/callhome_diarization/v1/diarization/kaldi_io.py
@@ -0,0 +1,627 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+# Copyright 2014-2016 Brno University of Technology (author: Karel Vesely)
+# Licensed under the Apache License, Version 2.0 (the "License")
+
+import numpy as np
+import sys, os, re, gzip, struct
+
+#################################################
+# Adding kaldi tools to shell path,
+
+# Select kaldi,
+if not 'KALDI_ROOT' in os.environ:
+ # Default! To change run python with 'export KALDI_ROOT=/some_dir python'
+ os.environ['KALDI_ROOT']='/mnt/matylda5/iveselyk/Tools/kaldi-trunk'
+
+# Add kaldi tools to path,
+os.environ['PATH'] = os.popen('echo $KALDI_ROOT/src/bin:$KALDI_ROOT/tools/openfst/bin:$KALDI_ROOT/src/fstbin/:$KALDI_ROOT/src/gmmbin/:$KALDI_ROOT/src/featbin/:$KALDI_ROOT/src/lm/:$KALDI_ROOT/src/sgmmbin/:$KALDI_ROOT/src/sgmm2bin/:$KALDI_ROOT/src/fgmmbin/:$KALDI_ROOT/src/latbin/:$KALDI_ROOT/src/nnetbin:$KALDI_ROOT/src/nnet2bin:$KALDI_ROOT/src/nnet3bin:$KALDI_ROOT/src/online2bin/:$KALDI_ROOT/src/ivectorbin/:$KALDI_ROOT/src/lmbin/').readline().strip() + ':' + os.environ['PATH']
+
+
+#################################################
+# Define all custom exceptions,
+class UnsupportedDataType(Exception): pass
+class UnknownVectorHeader(Exception): pass
+class UnknownMatrixHeader(Exception): pass
+
+class BadSampleSize(Exception): pass
+class BadInputFormat(Exception): pass
+
+class SubprocessFailed(Exception): pass
+
+#################################################
+# Data-type independent helper functions,
+
+def open_or_fd(file, mode='rb'):
+ """ fd = open_or_fd(file)
+ Open file, gzipped file, pipe, or forward the file-descriptor.
+ Eventually seeks in the 'file' argument contains ':offset' suffix.
+ """
+ offset = None
+ try:
+ # strip 'ark:' prefix from r{x,w}filename (optional),
+ if re.search('^(ark|scp)(,scp|,b|,t|,n?f|,n?p|,b?o|,n?s|,n?cs)*:', file):
+ (prefix,file) = file.split(':',1)
+ # separate offset from filename (optional),
+ if re.search(':[0-9]+$', file):
+ (file,offset) = file.rsplit(':',1)
+ # input pipe?
+ if file[-1] == '|':
+ fd = popen(file[:-1], 'rb') # custom,
+ # output pipe?
+ elif file[0] == '|':
+ fd = popen(file[1:], 'wb') # custom,
+ # is it gzipped?
+ elif file.split('.')[-1] == 'gz':
+ fd = gzip.open(file, mode)
+ # a normal file...
+ else:
+ fd = open(file, mode)
+ except TypeError:
+ # 'file' is opened file descriptor,
+ fd = file
+ # Eventually seek to offset,
+ if offset != None: fd.seek(int(offset))
+ return fd
+
+# based on '/usr/local/lib/python3.4/os.py'
+def popen(cmd, mode="rb"):
+ if not isinstance(cmd, str):
+ raise TypeError("invalid cmd type (%s, expected string)" % type(cmd))
+
+ import subprocess, io, threading
+
+ # cleanup function for subprocesses,
+ def cleanup(proc, cmd):
+ ret = proc.wait()
+ if ret > 0:
+ raise SubprocessFailed('cmd %s returned %d !' % (cmd,ret))
+ return
+
+ # text-mode,
+ if mode == "r":
+ proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
+ threading.Thread(target=cleanup,args=(proc,cmd)).start() # clean-up thread,
+ return io.TextIOWrapper(proc.stdout)
+ elif mode == "w":
+ proc = subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE)
+ threading.Thread(target=cleanup,args=(proc,cmd)).start() # clean-up thread,
+ return io.TextIOWrapper(proc.stdin)
+ # binary,
+ elif mode == "rb":
+ proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
+ threading.Thread(target=cleanup,args=(proc,cmd)).start() # clean-up thread,
+ return proc.stdout
+ elif mode == "wb":
+ proc = subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE)
+ threading.Thread(target=cleanup,args=(proc,cmd)).start() # clean-up thread,
+ return proc.stdin
+ # sanity,
+ else:
+ raise ValueError("invalid mode %s" % mode)
+
+
+def read_key(fd):
+ """ [key] = read_key(fd)
+ Read the utterance-key from the opened ark/stream descriptor 'fd'.
+ """
+ key = ''
+ while 1:
+ char = fd.read(1).decode("latin1")
+ if char == '' : break
+ if char == ' ' : break
+ key += char
+ key = key.strip()
+ if key == '': return None # end of file,
+ assert(re.match('^\S+$',key) != None) # check format (no whitespace!)
+ return key
+
+
+#################################################
+# Integer vectors (alignments, ...),
+
+def read_ali_ark(file_or_fd):
+ """ Alias to 'read_vec_int_ark()' """
+ return read_vec_int_ark(file_or_fd)
+
+def read_vec_int_ark(file_or_fd):
+ """ generator(key,vec) = read_vec_int_ark(file_or_fd)
+ Create generator of (key,vector) tuples, which reads from the ark file/stream.
+ file_or_fd : ark, gzipped ark, pipe or opened file descriptor.
+
+ Read ark to a 'dictionary':
+ d = { u:d for u,d in kaldi_io.read_vec_int_ark(file) }
+ """
+ fd = open_or_fd(file_or_fd)
+ try:
+ key = read_key(fd)
+ while key:
+ ali = read_vec_int(fd)
+ yield key, ali
+ key = read_key(fd)
+ finally:
+ if fd is not file_or_fd: fd.close()
+
+def read_vec_int(file_or_fd):
+ """ [int-vec] = read_vec_int(file_or_fd)
+ Read kaldi integer vector, ascii or binary input,
+ """
+ fd = open_or_fd(file_or_fd)
+ binary = fd.read(2).decode()
+ if binary == '\0B': # binary flag
+ assert(fd.read(1).decode() == '\4'); # int-size
+ vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # vector dim
+ # Elements from int32 vector are sored in tuples: (sizeof(int32), value),
+ vec = np.frombuffer(fd.read(vec_size*5), dtype=[('size','int8'),('value','int32')], count=vec_size)
+ assert(vec[0]['size'] == 4) # int32 size,
+ ans = vec[:]['value'] # values are in 2nd column,
+ else: # ascii,
+ arr = (binary + fd.readline().decode()).strip().split()
+ try:
+ arr.remove('['); arr.remove(']') # optionally
+ except ValueError:
+ pass
+ ans = np.array(arr, dtype=int)
+ if fd is not file_or_fd : fd.close() # cleanup
+ return ans
+
+# Writing,
+def write_vec_int(file_or_fd, v, key=''):
+ """ write_vec_int(f, v, key='')
+ Write a binary kaldi integer vector to filename or stream.
+ Arguments:
+ file_or_fd : filename or opened file descriptor for writing,
+ v : the vector to be stored,
+ key (optional) : used for writing ark-file, the utterance-id gets written before the vector.
+
+ Example of writing single vector:
+ kaldi_io.write_vec_int(filename, vec)
+
+ Example of writing arkfile:
+ with open(ark_file,'w') as f:
+ for key,vec in dict.iteritems():
+ kaldi_io.write_vec_flt(f, vec, key=key)
+ """
+ fd = open_or_fd(file_or_fd, mode='wb')
+ if sys.version_info[0] == 3: assert(fd.mode == 'wb')
+ try:
+ if key != '' : fd.write((key+' ').encode("latin1")) # ark-files have keys (utterance-id),
+ fd.write('\0B'.encode()) # we write binary!
+ # dim,
+ fd.write('\4'.encode()) # int32 type,
+ fd.write(struct.pack(np.dtype('int32').char, v.shape[0]))
+ # data,
+ for i in range(len(v)):
+ fd.write('\4'.encode()) # int32 type,
+ fd.write(struct.pack(np.dtype('int32').char, v[i])) # binary,
+ finally:
+ if fd is not file_or_fd : fd.close()
+
+
+#################################################
+# Float vectors (confidences, ivectors, ...),
+
+# Reading,
+def read_vec_flt_scp(file_or_fd):
+ """ generator(key,mat) = read_vec_flt_scp(file_or_fd)
+ Returns generator of (key,vector) tuples, read according to kaldi scp.
+ file_or_fd : scp, gzipped scp, pipe or opened file descriptor.
+
+ Iterate the scp:
+ for key,vec in kaldi_io.read_vec_flt_scp(file):
+ ...
+
+ Read scp to a 'dictionary':
+ d = { key:mat for key,mat in kaldi_io.read_mat_scp(file) }
+ """
+ fd = open_or_fd(file_or_fd)
+ try:
+ for line in fd:
+ (key,rxfile) = line.decode().split(' ')
+ vec = read_vec_flt(rxfile)
+ yield key, vec
+ finally:
+ if fd is not file_or_fd : fd.close()
+
+def read_vec_flt_ark(file_or_fd):
+ """ generator(key,vec) = read_vec_flt_ark(file_or_fd)
+ Create generator of (key,vector) tuples, reading from an ark file/stream.
+ file_or_fd : ark, gzipped ark, pipe or opened file descriptor.
+
+ Read ark to a 'dictionary':
+ d = { u:d for u,d in kaldi_io.read_vec_flt_ark(file) }
+ """
+ fd = open_or_fd(file_or_fd)
+ try:
+ key = read_key(fd)
+ while key:
+ ali = read_vec_flt(fd)
+ yield key, ali
+ key = read_key(fd)
+ finally:
+ if fd is not file_or_fd: fd.close()
+
+def read_vec_flt(file_or_fd):
+ """ [flt-vec] = read_vec_flt(file_or_fd)
+ Read kaldi float vector, ascii or binary input,
+ """
+ fd = open_or_fd(file_or_fd)
+ binary = fd.read(2).decode()
+ if binary == '\0B': # binary flag
+ # Data type,
+ header = fd.read(3).decode()
+ if header == 'FV ': sample_size = 4 # floats
+ elif header == 'DV ': sample_size = 8 # doubles
+ else: raise UnknownVectorHeader("The header contained '%s'" % header)
+ assert(sample_size > 0)
+ # Dimension,
+ assert(fd.read(1).decode() == '\4'); # int-size
+ vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # vector dim
+ # Read whole vector,
+ buf = fd.read(vec_size * sample_size)
+ if sample_size == 4 : ans = np.frombuffer(buf, dtype='float32')
+ elif sample_size == 8 : ans = np.frombuffer(buf, dtype='float64')
+ else : raise BadSampleSize
+ return ans
+ else: # ascii,
+ arr = (binary + fd.readline().decode()).strip().split()
+ try:
+ arr.remove('['); arr.remove(']') # optionally
+ except ValueError:
+ pass
+ ans = np.array(arr, dtype=float)
+ if fd is not file_or_fd : fd.close() # cleanup
+ return ans
+
+# Writing,
+def write_vec_flt(file_or_fd, v, key=''):
+ """ write_vec_flt(f, v, key='')
+ Write a binary kaldi vector to filename or stream. Supports 32bit and 64bit floats.
+ Arguments:
+ file_or_fd : filename or opened file descriptor for writing,
+ v : the vector to be stored,
+ key (optional) : used for writing ark-file, the utterance-id gets written before the vector.
+
+ Example of writing single vector:
+ kaldi_io.write_vec_flt(filename, vec)
+
+ Example of writing arkfile:
+ with open(ark_file,'w') as f:
+ for key,vec in dict.iteritems():
+ kaldi_io.write_vec_flt(f, vec, key=key)
+ """
+ fd = open_or_fd(file_or_fd, mode='wb')
+ if sys.version_info[0] == 3: assert(fd.mode == 'wb')
+ try:
+ if key != '' : fd.write((key+' ').encode("latin1")) # ark-files have keys (utterance-id),
+ fd.write('\0B'.encode()) # we write binary!
+ # Data-type,
+ if v.dtype == 'float32': fd.write('FV '.encode())
+ elif v.dtype == 'float64': fd.write('DV '.encode())
+ else: raise UnsupportedDataType("'%s', please use 'float32' or 'float64'" % v.dtype)
+ # Dim,
+ fd.write('\04'.encode())
+ fd.write(struct.pack(np.dtype('uint32').char, v.shape[0])) # dim
+ # Data,
+ fd.write(v.tobytes())
+ finally:
+ if fd is not file_or_fd : fd.close()
+
+
+#################################################
+# Float matrices (features, transformations, ...),
+
+# Reading,
+def read_mat_scp(file_or_fd):
+ """ generator(key,mat) = read_mat_scp(file_or_fd)
+ Returns generator of (key,matrix) tuples, read according to kaldi scp.
+ file_or_fd : scp, gzipped scp, pipe or opened file descriptor.
+
+ Iterate the scp:
+ for key,mat in kaldi_io.read_mat_scp(file):
+ ...
+
+ Read scp to a 'dictionary':
+ d = { key:mat for key,mat in kaldi_io.read_mat_scp(file) }
+ """
+ fd = open_or_fd(file_or_fd)
+ try:
+ for line in fd:
+ (key,rxfile) = line.decode().split(' ')
+ mat = read_mat(rxfile)
+ yield key, mat
+ finally:
+ if fd is not file_or_fd : fd.close()
+
+def read_mat_ark(file_or_fd):
+ """ generator(key,mat) = read_mat_ark(file_or_fd)
+ Returns generator of (key,matrix) tuples, read from ark file/stream.
+ file_or_fd : scp, gzipped scp, pipe or opened file descriptor.
+
+ Iterate the ark:
+ for key,mat in kaldi_io.read_mat_ark(file):
+ ...
+
+ Read ark to a 'dictionary':
+ d = { key:mat for key,mat in kaldi_io.read_mat_ark(file) }
+ """
+ fd = open_or_fd(file_or_fd)
+ try:
+ key = read_key(fd)
+ while key:
+ mat = read_mat(fd)
+ yield key, mat
+ key = read_key(fd)
+ finally:
+ if fd is not file_or_fd : fd.close()
+
+def read_mat(file_or_fd):
+ """ [mat] = read_mat(file_or_fd)
+ Reads single kaldi matrix, supports ascii and binary.
+ file_or_fd : file, gzipped file, pipe or opened file descriptor.
+ """
+ fd = open_or_fd(file_or_fd)
+ try:
+ binary = fd.read(2).decode()
+ if binary == '\0B' :
+ mat = _read_mat_binary(fd)
+ else:
+ assert(binary == ' [')
+ mat = _read_mat_ascii(fd)
+ finally:
+ if fd is not file_or_fd: fd.close()
+ return mat
+
+def _read_mat_binary(fd):
+ # Data type
+ header = fd.read(3).decode()
+ # 'CM', 'CM2', 'CM3' are possible values,
+ if header.startswith('CM'): return _read_compressed_mat(fd, header)
+ elif header == 'FM ': sample_size = 4 # floats
+ elif header == 'DM ': sample_size = 8 # doubles
+ else: raise UnknownMatrixHeader("The header contained '%s'" % header)
+ assert(sample_size > 0)
+ # Dimensions
+ s1, rows, s2, cols = np.frombuffer(fd.read(10), dtype='int8,int32,int8,int32', count=1)[0]
+ # Read whole matrix
+ buf = fd.read(rows * cols * sample_size)
+ if sample_size == 4 : vec = np.frombuffer(buf, dtype='float32')
+ elif sample_size == 8 : vec = np.frombuffer(buf, dtype='float64')
+ else : raise BadSampleSize
+ mat = np.reshape(vec,(rows,cols))
+ return mat
+
+def _read_mat_ascii(fd):
+ rows = []
+ while 1:
+ line = fd.readline().decode()
+ if (len(line) == 0) : raise BadInputFormat # eof, should not happen!
+ if len(line.strip()) == 0 : continue # skip empty line
+ arr = line.strip().split()
+ if arr[-1] != ']':
+ rows.append(np.array(arr,dtype='float32')) # not last line
+ else:
+ rows.append(np.array(arr[:-1],dtype='float32')) # last line
+ mat = np.vstack(rows)
+ return mat
+
+
+def _read_compressed_mat(fd, format):
+ """ Read a compressed matrix,
+ see: https://github.com/kaldi-asr/kaldi/blob/master/src/matrix/compressed-matrix.h
+ methods: CompressedMatrix::Read(...), CompressedMatrix::CopyToMat(...),
+ """
+ assert(format == 'CM ') # The formats CM2, CM3 are not supported...
+
+ # Format of header 'struct',
+ global_header = np.dtype([('minvalue','float32'),('range','float32'),('num_rows','int32'),('num_cols','int32')]) # member '.format' is not written,
+ per_col_header = np.dtype([('percentile_0','uint16'),('percentile_25','uint16'),('percentile_75','uint16'),('percentile_100','uint16')])
+
+ # Read global header,
+ globmin, globrange, rows, cols = np.frombuffer(fd.read(16), dtype=global_header, count=1)[0]
+
+ # The data is structed as [Colheader, ... , Colheader, Data, Data , .... ]
+ # { cols }{ size }
+ col_headers = np.frombuffer(fd.read(cols*8), dtype=per_col_header, count=cols)
+ col_headers = np.array([np.array([x for x in y]) * globrange * 1.52590218966964e-05 + globmin for y in col_headers], dtype=np.float32)
+ data = np.reshape(np.frombuffer(fd.read(cols*rows), dtype='uint8', count=cols*rows), newshape=(cols,rows)) # stored as col-major,
+
+ mat = np.zeros((cols,rows), dtype='float32')
+ p0 = col_headers[:, 0].reshape(-1, 1)
+ p25 = col_headers[:, 1].reshape(-1, 1)
+ p75 = col_headers[:, 2].reshape(-1, 1)
+ p100 = col_headers[:, 3].reshape(-1, 1)
+ mask_0_64 = (data <= 64)
+ mask_193_255 = (data > 192)
+ mask_65_192 = (~(mask_0_64 | mask_193_255))
+
+ mat += (p0 + (p25 - p0) / 64. * data) * mask_0_64.astype(np.float32)
+ mat += (p25 + (p75 - p25) / 128. * (data - 64)) * mask_65_192.astype(np.float32)
+ mat += (p75 + (p100 - p75) / 63. * (data - 192)) * mask_193_255.astype(np.float32)
+
+ return mat.T # transpose! col-major -> row-major,
+
+
+# Writing,
+def write_mat(file_or_fd, m, key=''):
+ """ write_mat(f, m, key='')
+ Write a binary kaldi matrix to filename or stream. Supports 32bit and 64bit floats.
+ Arguments:
+ file_or_fd : filename of opened file descriptor for writing,
+ m : the matrix to be stored,
+ key (optional) : used for writing ark-file, the utterance-id gets written before the matrix.
+
+ Example of writing single matrix:
+ kaldi_io.write_mat(filename, mat)
+
+ Example of writing arkfile:
+ with open(ark_file,'w') as f:
+ for key,mat in dict.iteritems():
+ kaldi_io.write_mat(f, mat, key=key)
+ """
+ fd = open_or_fd(file_or_fd, mode='wb')
+ if sys.version_info[0] == 3: assert(fd.mode == 'wb')
+ try:
+ if key != '' : fd.write((key+' ').encode("latin1")) # ark-files have keys (utterance-id),
+ fd.write('\0B'.encode()) # we write binary!
+ # Data-type,
+ if m.dtype == 'float32': fd.write('FM '.encode())
+ elif m.dtype == 'float64': fd.write('DM '.encode())
+ else: raise UnsupportedDataType("'%s', please use 'float32' or 'float64'" % m.dtype)
+ # Dims,
+ fd.write('\04'.encode())
+ fd.write(struct.pack(np.dtype('uint32').char, m.shape[0])) # rows
+ fd.write('\04'.encode())
+ fd.write(struct.pack(np.dtype('uint32').char, m.shape[1])) # cols
+ # Data,
+ fd.write(m.tobytes())
+ finally:
+ if fd is not file_or_fd : fd.close()
+
+
+#################################################
+# 'Posterior' kaldi type (posteriors, confusion network, nnet1 training targets, ...)
+# Corresponds to: vector > >
+# - outer vector: time axis
+# - inner vector: records at the time
+# - tuple: int = index, float = value
+#
+
+def read_cnet_ark(file_or_fd):
+ """ Alias of function 'read_post_ark()', 'cnet' = confusion network """
+ return read_post_ark(file_or_fd)
+
+def read_post_ark(file_or_fd):
+ """ generator(key,vec>) = read_post_ark(file)
+ Returns generator of (key,posterior) tuples, read from ark file.
+ file_or_fd : ark, gzipped ark, pipe or opened file descriptor.
+
+ Iterate the ark:
+ for key,post in kaldi_io.read_post_ark(file):
+ ...
+
+ Read ark to a 'dictionary':
+ d = { key:post for key,post in kaldi_io.read_post_ark(file) }
+ """
+ fd = open_or_fd(file_or_fd)
+ try:
+ key = read_key(fd)
+ while key:
+ post = read_post(fd)
+ yield key, post
+ key = read_key(fd)
+ finally:
+ if fd is not file_or_fd: fd.close()
+
+def read_post(file_or_fd):
+ """ [post] = read_post(file_or_fd)
+ Reads single kaldi 'Posterior' in binary format.
+
+ The 'Posterior' is C++ type 'vector > >',
+ the outer-vector is usually time axis, inner-vector are the records
+ at given time, and the tuple is composed of an 'index' (integer)
+ and a 'float-value'. The 'float-value' can represent a probability
+ or any other numeric value.
+
+ Returns vector of vectors of tuples.
+ """
+ fd = open_or_fd(file_or_fd)
+ ans=[]
+ binary = fd.read(2).decode(); assert(binary == '\0B'); # binary flag
+ assert(fd.read(1).decode() == '\4'); # int-size
+ outer_vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # number of frames (or bins)
+
+ # Loop over 'outer-vector',
+ for i in range(outer_vec_size):
+ assert(fd.read(1).decode() == '\4'); # int-size
+ inner_vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # number of records for frame (or bin)
+ data = np.frombuffer(fd.read(inner_vec_size*10), dtype=[('size_idx','int8'),('idx','int32'),('size_post','int8'),('post','float32')], count=inner_vec_size)
+ assert(data[0]['size_idx'] == 4)
+ assert(data[0]['size_post'] == 4)
+ ans.append(data[['idx','post']].tolist())
+
+ if fd is not file_or_fd: fd.close()
+ return ans
+
+
+#################################################
+# Kaldi Confusion Network bin begin/end times,
+# (kaldi stores CNs time info separately from the Posterior).
+#
+
+def read_cntime_ark(file_or_fd):
+ """ generator(key,vec>) = read_cntime_ark(file_or_fd)
+ Returns generator of (key,cntime) tuples, read from ark file.
+ file_or_fd : file, gzipped file, pipe or opened file descriptor.
+
+ Iterate the ark:
+ for key,time in kaldi_io.read_cntime_ark(file):
+ ...
+
+ Read ark to a 'dictionary':
+ d = { key:time for key,time in kaldi_io.read_post_ark(file) }
+ """
+ fd = open_or_fd(file_or_fd)
+ try:
+ key = read_key(fd)
+ while key:
+ cntime = read_cntime(fd)
+ yield key, cntime
+ key = read_key(fd)
+ finally:
+ if fd is not file_or_fd : fd.close()
+
+def read_cntime(file_or_fd):
+ """ [cntime] = read_cntime(file_or_fd)
+ Reads single kaldi 'Confusion Network time info', in binary format:
+ C++ type: vector >.
+ (begin/end times of bins at the confusion network).
+
+ Binary layout is ' ...'
+
+ file_or_fd : file, gzipped file, pipe or opened file descriptor.
+
+ Returns vector of tuples.
+ """
+ fd = open_or_fd(file_or_fd)
+ binary = fd.read(2).decode(); assert(binary == '\0B'); # assuming it's binary
+
+ assert(fd.read(1).decode() == '\4'); # int-size
+ vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # number of frames (or bins)
+
+ data = np.frombuffer(fd.read(vec_size*10), dtype=[('size_beg','int8'),('t_beg','float32'),('size_end','int8'),('t_end','float32')], count=vec_size)
+ assert(data[0]['size_beg'] == 4)
+ assert(data[0]['size_end'] == 4)
+ ans = data[['t_beg','t_end']].tolist() # Return vector of tuples (t_beg,t_end),
+
+ if fd is not file_or_fd : fd.close()
+ return ans
+
+
+#################################################
+# Segments related,
+#
+
+# Segments as 'Bool vectors' can be handy,
+# - for 'superposing' the segmentations,
+# - for frame-selection in Speaker-ID experiments,
+def read_segments_as_bool_vec(segments_file):
+ """ [ bool_vec ] = read_segments_as_bool_vec(segments_file)
+ using kaldi 'segments' file for 1 wav, format : ' '
+ - t-beg, t-end is in seconds,
+ - assumed 100 frames/second,
+ """
+ segs = np.loadtxt(segments_file, dtype='object,object,f,f', ndmin=1)
+ # Sanity checks,
+ assert(len(segs) > 0) # empty segmentation is an error,
+ assert(len(np.unique([rec[1] for rec in segs ])) == 1) # segments with only 1 wav-file,
+ # Convert time to frame-indexes,
+ start = np.rint([100 * rec[2] for rec in segs]).astype(int)
+ end = np.rint([100 * rec[3] for rec in segs]).astype(int)
+ # Taken from 'read_lab_to_bool_vec', htk.py,
+ frms = np.repeat(np.r_[np.tile([False,True], len(end)), False],
+ np.r_[np.c_[start - np.r_[0, end[:-1]], end-start].flat, 0])
+ assert np.sum(end-start) == np.sum(frms)
+ return frms
+
diff --git a/egs/callhome_diarization/v1/diarization/nnet3/xvector/extract_xvectors.sh b/egs/callhome_diarization/v1/diarization/nnet3/xvector/extract_xvectors.sh
index d7591a6a3a8..8d579138c73 100755
--- a/egs/callhome_diarization/v1/diarization/nnet3/xvector/extract_xvectors.sh
+++ b/egs/callhome_diarization/v1/diarization/nnet3/xvector/extract_xvectors.sh
@@ -102,7 +102,7 @@ if [ $stage -le 0 ]; then
fi
utils/data/get_uniform_subsegments.py \
--max-segment-duration=$window \
- --overlap-duration=$(echo "$window-$period" | bc) \
+ --overlap-duration=$(perl -e "print ($window-$period);") \
--max-remaining-duration=$min_segment \
--constant-duration=True \
$segments > $dir/subsegments
diff --git a/egs/callhome_diarization/v1/diarization/train_ivector_extractor_diag.sh b/egs/callhome_diarization/v1/diarization/train_ivector_extractor_diag.sh
new file mode 100755
index 00000000000..6751fb7dd22
--- /dev/null
+++ b/egs/callhome_diarization/v1/diarization/train_ivector_extractor_diag.sh
@@ -0,0 +1,166 @@
+#!/bin/bash
+
+# Copyright 2013 Daniel Povey
+# 2014 David Snyder
+# Apache 2.0.
+
+# This script trains the i-vector extractor. Note: there are 3 separate levels
+# of parallelization: num_threads, num_processes, and num_jobs. This may seem a
+# bit excessive. It has to do with minimizing memory usage and disk I/O,
+# subject to various constraints. The "num_threads" is how many threads a
+# program uses; the "num_processes" is the number of separate processes a single
+# job spawns, and then sums the accumulators in memory. Our recommendation:
+# - Set num_threads to the minimum of (4, or how many virtual cores your machine has).
+# (because of needing to lock various global quantities, the program can't
+# use many more than 4 threads with good CPU utilization).
+# - Set num_processes to the number of virtual cores on each machine you have, divided by
+# num_threads. E.g. 4, if you have 16 virtual cores. If you're on a shared queue
+# that's busy with other people's jobs, it may be wise to set it to rather less
+# than this maximum though, or your jobs won't get scheduled. And if memory is
+# tight you need to be careful; in our normal setup, each process uses about 5G.
+# - Set num_jobs to as many of the jobs (each using $num_threads * $num_processes CPUs)
+# your queue will let you run at one time, but don't go much more than 10 or 20, or
+# summing the accumulators will possibly get slow. If you have a lot of data, you
+# may want more jobs, though.
+
+# Begin configuration section.
+nj=10 # this is the number of separate queue jobs we run, but each one
+ # contains num_processes sub-jobs.. the real number of threads we
+ # run is nj * num_processes * num_threads, and the number of
+ # separate pieces of data is nj * num_processes.
+num_threads=4
+num_processes=4 # each job runs this many processes, each with --num-threads threads
+cmd="queue.pl"
+stage=-4
+num_gselect=20 # Gaussian-selection using diagonal model: number of Gaussians to select
+ivector_dim=400 # dimension of the extracted i-vector
+use_weights=false # set to true to turn on the regression of log-weights on the ivector.
+num_iters=10
+min_post=0.025 # Minimum posterior to use (posteriors below this are pruned out)
+num_samples_for_weights=3 # smaller than the default for speed (relates to a sampling method)
+cleanup=true
+apply_cmn=true # If true, apply sliding window cepstral mean normalization
+posterior_scale=1.0 # This scale helps to control for successve features being highly
+ # correlated. E.g. try 0.1 or 0.3
+sum_accs_opt=
+# End configuration section.
+
+echo "$0 $@" # Print the command line for logging
+
+if [ -f path.sh ]; then . ./path.sh; fi
+. parse_options.sh || exit 1;
+
+
+if [ $# != 3 ]; then
+ echo "Usage: $0 "
+ echo " e.g.: $0 exp/ubm_2048_male/final.dubm data/train_male exp/extractor_male"
+ echo "main options (for others, see top of script file)"
+ echo " --config # config containing options"
+ echo " --cmd (utils/run.pl|utils/queue.pl ) # how to run jobs."
+ echo " --num-iters <#iters|10> # Number of iterations of E-M"
+ echo " --nj # Number of jobs (also see num-processes and num-threads)"
+ echo " --num-processes # Number of processes for each queue job (relates"
+ echo " # to summing accs in memory)"
+ echo " --num-threads # Number of threads for each process (can't be usefully"
+ echo " # increased much above 4)"
+ echo " --stage # To control partial reruns"
+ echo " --num-gselect # Number of Gaussians to select using"
+ echo " # diagonal model."
+ echo " --sum-accs-opt