Committing initial version of Kaldi

git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@2 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
This commit is contained in:
Dan Povey 2011-05-14 21:48:08 +00:00
Коммит 10e9002c88
643 изменённых файлов: 114670 добавлений и 0 удалений

270
COPYING Normal file
Просмотреть файл

@ -0,0 +1,270 @@
Legal Notices
Each of the files comprising Kaldi v1.0 have been separately licensed by
their respective author(s) under the terms of the Apache License v 2.0 (set
forth below). The source code headers for each file specifies the individual
authors and source material for that file as well the corresponding copyright
notice. For reference purposes only: A cumulative list of all individual
contributors and original source material as well as the full text of the Apache
License v 2.0 are set forth below.
Individual Contributors (in alphabetical order)
Mohit Agarwal
Gilles Boulianne
Lukas Burget
Ondrej Glembek
Arnab Ghoshal
Go Vivace Inc.
Mirko Hannemann
Microsoft Corporation
Petr Motlicek
Ariya Rastrow
Petr Schwarz
Georg Stemmer
Jan Silovsky
Phonexia s.r.o.
Yanmin Qian
Karel Vesely
Haihua Xu
Other Source Material
This project includes a port and modification of materials from JAMA: A Java
Matrix Package under the following notice: "This software is a cooperative
product of The MathWorks and the National Institute of Standards and Technology
(NIST) which has been released to the public domain." This notice and the
original code is available at http://math.nist.gov/javanumerics/jama/
This project includes a modified version of code published in Malvar, H.,
"Signal processing with lapped transforms," Artech House, Inc., 1992. The
current copyright holder, Henrique S. Malvar, has given his permission for the
release of this modified version under the Apache License 2.0.
This file includes material from the OpenFST Library v1.2.7 available at
http://www.openfst.org/twiki/bin/view/FST/WebHome and released under the
Apache License v. 2.0.
[OpenFst COPYING file begins here]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use these files except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Copyright 2005-2010 Google, Inc.
[OpenFst COPYING file ends here]
-------------------------------------------------------------------------
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

9
INSTALL Normal file
Просмотреть файл

@ -0,0 +1,9 @@
[for native Windows install, see windows/INSTALL]
(1)
go to tools/ and follow INSTALL instructions there.
(2)
go to src/ and follow INSTALL instructions there.

31
README.txt Normal file
Просмотреть файл

@ -0,0 +1,31 @@
This README has been created for those with whom we share the
"pre-release" version of Kaldi. Although the toolkit has not
been "officially" released, I have been given the OK to share
it privately for "non-commercial purposes" (whatever that means).
The official release is scheduled for mid-March.
The current version is not as polished as we would like, and contains
some files that should eventually be deleted.
See http://merlin.fit.vutbr.cz/kaldi/ for documentation
(may not always be fully up to date). This documentation
is generated by running "doxygen" from the src/ directory,
and appears in src/html/
I assume that the reader would like to (1) build the toolkit
and (2) run the example system builds.
To build the toolkit: see ./INSTALL. These instructions are valid for UNIX
systems including various flavors of Linux; Darwin; and Cygwin (has not been
tested on more "exotic" varieties of UNIX). For Windows installation
instructions (excluding Cygwin), see windows/INSTALL.
To run the example system builds, see egs/README.txt
If you encounter problems (and you probably will), your first point of contact
should be Dan Povey (dpovey@microsoft.com). In addition to specific questions,
please let me know if there are specific aspects of the project that you feel
could be improved, that you find confusing, etc., and which missing features you
most wish it had.

21
egs/README.txt Normal file
Просмотреть файл

@ -0,0 +1,21 @@
This directory contains example scripts that demonstrate how to
use Kaldi. Each subdirectory corresponds to a corpus that we have
example scripts for. Currently these are both corpora available from
the Linguistic Data Consortium (LDC).
Explanations of the corpora are below:
wsj: The Wall Street Journal corpus. This is a corpus of read
sentences from the Wall Street Journal, recorded under clean conditions.
The vocabulary is quite large.
Available from the LDC as either: [ catalog numbers LDC93S6A (WSJ0) and LDC94S13A (WSJ1) ]
or: [ catalog numbers LDC93S6B (WSJ0) and LDC94S13B (WSJ1) ]
The latter option is cheaper and includes only the Sennheiser
microphone data (which is all we use in the example scripts).
rm: Resource Management. Clean speech in a medium-vocabulary task consisting
of commands to a (presumably imaginary) computer system.
Available from the LDC as catalog number LDC93S3A (it may be possible to
get the same data using combinations of other catalog numbers, but this
is the one we used).

8
egs/rm/README.txt Normal file
Просмотреть файл

@ -0,0 +1,8 @@
Each subdirectory of this directory contains the
scripts for a sequence of experiments.
s1: This setup is experiments with GMM-based systems with various
Maximum Likelihood
techniques including global and speaker-specific transforms.
See a parallel setup in ../wsj/s1

11
egs/rm/s1/NOTES Normal file
Просмотреть файл

@ -0,0 +1,11 @@
Note RE decoding beams:
WER
Beam 20 25 30
monophone 18.28 28.24
triphone 6.767 6.724 6.724 [tri1]
Time [on svatava, xRT]
triphone 0.13 0.27 0.43 [tri1]

1
egs/rm/s1/conf/mfcc.conf Normal file
Просмотреть файл

@ -0,0 +1 @@
--use-energy=false # only non-default option.

22
egs/rm/s1/conf/topo.proto Normal file
Просмотреть файл

@ -0,0 +1,22 @@
<Topology>
<TopologyEntry>
<ForPhones>
NONSILENCEPHONES
</ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.75 <Transition> 1 0.25 </State>
<State> 1 <PdfClass> 1 <Transition> 1 0.75 <Transition> 2 0.25 </State>
<State> 2 <PdfClass> 2 <Transition> 2 0.75 <Transition> 3 0.25 </State>
<State> 3 </State>
</TopologyEntry>
<TopologyEntry>
<ForPhones>
SILENCEPHONES
</ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.25 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 </State>
<State> 1 <PdfClass> 1 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 2 <PdfClass> 2 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 3 <PdfClass> 3 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 4 <PdfClass> 4 <Transition> 4 0.25 <Transition> 5 0.75 </State>
<State> 5 </State>
</TopologyEntry>
</Topology>

Просмотреть файл

@ -0,0 +1,69 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# usage: make_trans.sh prefix in.flist input.snr out.txt out.scp
# prefix is first letters of the database "key" (rest are numeric)
# in.flist is just a list of filenames, probably of .sph files.
# input.snr is an snr format file from the RM dataset.
# out.txt is the output transcriptions in format "key word1 word\n"
# out.scp is the output scp file, which is as in.scp but has the
# database-key first on each line.
# Reads from first argument e.g. $rootdir/rm1_audio1/rm1/doc/al_sents.snr
# and second argument train_wav.scp
# Writes to standard output trans.txt
if(@ARGV != 5) {
die "usage: make_trans.sh prefix in.flist input.snr out.txt out.scp\n";
}
($prefix, $in_flist, $input_snr, $out_txt, $out_scp) = @ARGV;
open(F, "<$input_snr") || die "Opening SNOR file $input_snr";
while(<F>) {
if(m/^;/) { next; }
m/(.+) \((.+)\)/ || die "bad line $_";
$T{$2} = $1;
}
close(F);
open(G, "<$in_flist") || die "Opening file list $in_flist";
open(O, ">$out_txt") || die "Open output transcription file $out_txt";
open(P, ">$out_scp") || die "Open output scp file $out_scp";
while(<G>) {
$_ =~ m:/(\w+)/(\w+)\.sph\s+$:i || die "bad scp line $_";
$spkname = $1;
$uttname = $2;
$uttname =~ tr/a-z/A-Z/;
defined $T{$uttname} || die "no trans for sent $uttname";
$spkname =~ s/_//g; # remove underscore from spk name to make key nicer.
$key = $prefix . "_" . $spkname . "_" . $uttname;
$key =~ tr/A-Z/a-z/; # Make it all lower case.
# to make the numerical and string-sorted orders the same.
print O "$key $T{$uttname}\n";
print P "$key $_";
$n++;
}
close(O) || die "Closing output.";
close(P) || die "Closing output.";

92
egs/rm/s1/data_prep/run.sh Executable file
Просмотреть файл

@ -0,0 +1,92 @@
# This script should be run from the directory where it is located (i.e. data_prep)
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# The input is the 3 CDs from the LDC distribution of Resource Management.
# The script's argument is a directory which has three subdirectories:
# rm1_audio1 rm1_audio2 rm2_audio
if [ $# != 1 ]; then
echo "Usage: ./run.sh /path/to/RM"
exit 1;
fi
RMROOT=$1
if [ ! -d $RMROOT/rm1_audio1 -o ! -d $RMROOT/rm1_audio2 ]; then
echo "Error: run.sh requires a directory argument that contains rm1_audio1 and rm1_audio2"
exit 1;
fi
if [ ! -d $RMROOT/rm2_audio ]; then
echo "**Warning: $RMROOT/rm2_audio does not exist; won't create spk2gender.map file correctly***"
sleep 1
fi
(
find $RMROOT/rm1_audio1/rm1/ind_trn -iname '*.sph';
find $RMROOT/rm1_audio2/2_4_2/rm1/ind/dev_aug -iname '*.sph';
) | perl -ane ' m:/sa\d.sph:i || m:/sb\d\d.sph:i || print; ' > train_sph.flist
# make_trans.pl also creates the utterance id's and the kaldi-format scp file.
./make_trans.pl trn train_sph.flist $RMROOT/rm1_audio1/rm1/doc/al_sents.snr train_trans.txt train_sph.scp
mv train_trans.txt tmp; sort -k 1 tmp > train_trans.txt
mv train_sph.scp tmp; sort -k 1 tmp > train_sph.scp
sph2pipe=`cd ../../../..; echo $PWD/tools/sph2pipe_v2.5/sph2pipe`
if [ ! -f $sph2pipe ]; then
echo "Could not find the sph2pipe program at $sph2pipe";
exit 1;
fi
awk '{printf("%s '$sph2pipe' -f wav %s |\n", $1, $2);}' < train_sph.scp > train_wav.scp
cat train_wav.scp | perl -ane 'm/^(\w+_(\w+)\w_\w+) / || die; print "$1 $2\n"' > train.utt2spk
cat train.utt2spk | sort -k 2 | ../scripts/utt2spk_to_spk2utt.pl > train.spk2utt
for ntest in 1_mar87 2_oct87 4_feb89 5_oct89 6_feb91 7_sep92; do
n=`echo $ntest | cut -d_ -f 1`
test=`echo $ntest | cut -d_ -f 2`
root=$RMROOT/rm1_audio2/2_4_2
for x in `grep -v ';' $root/rm1/doc/tests/$ntest/${n}_indtst.ndx`; do
echo "$root/$x ";
done > test_${test}_sph.flist
done
# make_trans.pl also creates the utterance id's and the kaldi-format scp file.
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
./make_trans.pl ${test} test_${test}_sph.flist $RMROOT/rm1_audio1/rm1/doc/al_sents.snr test_${test}_trans.txt test_${test}_sph.scp
mv test_${test}_trans.txt tmp; sort -k 1 tmp > test_${test}_trans.txt
mv test_${test}_sph.scp tmp; sort -k 1 tmp > test_${test}_sph.scp
awk '{printf("%s '$sph2pipe' -f wav %s |\n", $1, $2);}' < test_${test}_sph.scp > test_${test}_wav.scp
cat test_${test}_wav.scp | perl -ane 'm/^(\w+_(\w+)\w_\w+) / || die; print "$1 $2\n"' > test_${test}.utt2spk
cat test_${test}.utt2spk | sort -k 2 | ../scripts/utt2spk_to_spk2utt.pl > test_${test}.spk2utt
done
cat $RMROOT/rm1_audio2/2_5_1/rm1/doc/al_spkrs.txt \
$RMROOT/rm2_audio/3-1.2/rm2/doc/al_spkrs.txt | \
perl -ane 'tr/A-Z/a-z/;print;' | grep -v ';' | \
awk '{print $1, $2}' > spk2gender.map
../scripts/make_rm_lm.pl $RMROOT/rm1_audio1/rm1/doc/wp_gram.txt > G.txt
# Getting lexicon
../scripts/make_rm_dict.pl $RMROOT/rm1_audio2/2_4_2/score/src/rdev/pcdsril.txt > lexicon.txt
echo Succeeded.

Просмотреть файл

@ -0,0 +1,39 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
fake=false
if [ "$1" == "--fake" ]; then
fake=true
shift
fi
sphdir=$1 # e.g. /mnt/matylda2/data/RM
wavdir=$2 # e.g. /mnt/matylda6/jhu09/qpovey/kaldi_rm_wav
flistin=$3 # e.g. train_sph.flist, contains sph files in sphdir
flistout=$4 # e.g. train_wav.flist, contains wav files in wavdir
if [ $fake == false ]; then
for x in `cat $flistin`; do
y=`echo $x | sed s:$sphdir:$wavdir: | sed s:.sph:.wav:`;
mkdir -p `dirname $y`
../../tools/sph2pipe_v2.5/sph2pipe -f wav $x $y || exit 1;
done
fi
cat $flistin | sed s:$sphdir:$wavdir: | sed s:.sph:.wav: > $flistout || exit 1;

1
egs/rm/s1/path.sh Executable file
Просмотреть файл

@ -0,0 +1 @@
export PATH=$PATH:../../../src/bin:../../../tools/openfst/bin:../../../src/fstbin/:../../../src/gmmbin/:../../../src/featbin/:../../../src/fgmmbin:../../../src/sgmmbin

96
egs/rm/s1/run.sh Normal file
Просмотреть файл

@ -0,0 +1,96 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
exit 1 # Don't run this... it's to be run line by line from the shell.
# This script file cannot be run as-is; some paths in it need to be changed
# before you can run it.
# Search for /path/to.
# It is recommended that you do not invoke this file from the shell, but
# run the paths one by one, by hand.
# the step in data_prep/ will need to be modified for your system.
# First step is to do data preparation:
# This just creates some text files, it is fast.
# If not on the BUT system, you would have to change run.sh to reflect
# your own paths.
#
#Example arguments to run.sh: /mnt/matylda2/data/RM, /ais/gobi2/speech/RM, /cygdrive/e/data/RM
# RM is a directory with subdirectories rm1_audio1, rm1_audio2, rm2_audio
cd data_prep
#*** You have to change the pathname below.***
./run.sh /path/to/RM
cd ..
mkdir -p data
( cd data; cp ../data_prep/{train,test*}.{spk2utt,utt2spk} . ; cp ../data_prep/spk2gender.map . )
# This next step converts the lexicon, grammar, etc., into FST format.
steps/prepare_graphs.sh
# Next, make sure that "exp/" is someplace you can write a significant amount of
# data to (e.g. make it a link to a file on some reasonably large file system).
# If it doesn't exist, the scripts below will make the directory "exp".
# tempdir should be set to some place to put training mfcc's
# where you have space.
#e.g.: tempdir=/mnt/matylda6/jhu09/qpovey/kaldi_rm_mfccb
mfccdir=/path/to/mfccdir
steps/make_mfcc_train.sh $mfccdir
steps/make_mfcc_test.sh $mfccdir
steps/train_mono.sh
steps/decode_mono.sh &
steps/train_tri1.sh
steps/decode_tri1.sh &
steps/train_tri2a.sh
steps/decode_tri2a.sh &
# Then do the same for 2b, 2c, and so on
# 2a = basic triphone (all features double-deltas unless stated).
# 2b = exponential transform
# 2c = mean normalization (cmn)
# 2d = MLLT
# 2e = splice-9-frames + LDA
# 2f = splice-9-frames + LDA + MLLT
# 2g = linear VTLN (+ regular VTLN); various decode scripts available.
# 2h = splice-9-frames + HLDA
# 2i = triple-deltas + HLDA
# 2j = triple-deltas + LDA + MLLT
# 2k = LDA + ET (equiv to LDA+MLLT+ET)
# To train and test SGMM systems:
steps/train_ubma.sh
# train and test unadapted system
steps/train_sgmma.sh
steps/decode_sgmma.sh
# train and test system with speaker vectors.
steps/train_sgmmb.sh
steps/decode_sgmmb.sh

Просмотреть файл

@ -0,0 +1,58 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# Adds some specified number of disambig symbols to a symbol table.
# Adds these as #1, #2, etc.
# If the --include-zero option is specified, includes an extra one
# #0.
if(!(@ARGV == 2 || (@ARGV ==3 && $ARGV[0] eq "--include-zero"))) {
die "Usage: add_disambig.pl [--include-zero] symtab.txt num_extra > symtab_out.txt ";
}
if(@ARGV == 3) {
$include_zero = 1;
$ARGV[0] eq "--include-zero" || die "Bad option/first argument $ARGV[0]";
shift @ARGV;
} else {
$include_zero = 0;
}
$input = $ARGV[0];
$nsyms = $ARGV[1];
open(F, "<$input") || die "Opening file $input";
while(<F>) {
@A = split(" ", $_);
@A == 2 || die "Bad line $_";
$lastsym = $A[1];
print;
}
if(!defined($lastsym)){
die "Empty symbol file?";
}
if($include_zero) {
$lastsym++;
print "#0 $lastsym\n";
}
for($n = 1; $n <= $nsyms; $n++) {
$y = $n + $lastsym;
print "#$n $y\n";
}

Просмотреть файл

@ -0,0 +1,101 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# Adds disambiguation symbols to a lexicon.
# Outputs still in the normal lexicon format.
# Disambig syms are numbered #1, #2, #3, etc. (#0
# reserved for symbol in grammar).
# Outputs the number of disambig syms to the standard output.
if(@ARGV != 2) {
die "Usage: add_lex_disambig.pl lexicon.txt lexicon_disambig.txt "
}
$lexfn = shift @ARGV;
$lexoutfn = shift @ARGV;
open(L, "<$lexfn") || die "Error opening lexicon $lexfn";
# (1) Read in the lexicon.
@L = ( );
while(<L>) {
@A = split(" ", $_);
push @L, join(" ", @A);
}
# (2) Work out the count of each phone-sequence in the
# lexicon.
foreach $l (@L) {
@A = split(" ", $l);
shift @A; # Remove word.
$count{join(" ",@A)}++;
}
# (3) For each left sub-sequence of each phone-sequence, note down
# that exists (for identifying prefixes of longer strings).
foreach $l (@L) {
@A = split(" ", $l);
shift @A; # Remove word.
while(@A > 0) {
pop @A; # Remove last phone
$issubseq{join(" ",@A)} = 1;
}
}
# (4) For each entry in the lexicon:
# if the phone sequence is unique and is not a
# prefix of another word, no diambig symbol.
# Else output #1, or #2, #3, ... if the same phone-seq
# has already been assigned a disambig symbol.
open(O, ">$lexoutfn") || die "Opening lexicon file $lexoutfn for writing.\n";
$max_disambig = 0;
foreach $l (@L) {
@A = split(" ", $l);
$word = shift @A;
$phnseq = join(" ",@A);
if(!defined $issubseq{$phnseq}
&& $count{$phnseq}==1) {
; # Do nothing.
} else {
if($phnseq eq "") { # need disambig symbols for the empty string
# that are not use anywhere else.
$max_disambig++;
$reserved{$max_disambig} = 1;
$phnseq = "#$max_disambig";
} else {
$curnumber = $disambig_of{$phnseq};
if(!defined{$curnumber}) { $curnumber = 0; }
$curnumber++; # now 1 or 2, ...
while(defined $reserved{$curnumber} ) { $curnumber++; } # skip over reserved symbols
if($curnumber > $max_disambig) {
$max_disambig = $curnumber;
}
$disambig_of{$phnseq} = $curnumber;
$phnseq = $phnseq . " #" . $curnumber;
}
}
print O "$word\t$phnseq\n";
}
print $max_disambig . "\n";

40
egs/rm/s1/scripts/filter_scp.pl Executable file
Просмотреть файл

@ -0,0 +1,40 @@
#!/usr/bin/perl -w
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# This script takes a list of utterance-ids and filters an scp
# file (or any file whose first field is an utterance id), printing
# out only those lines whose first field is in id_list.
if(@ARGV < 1 || @ARGV > 2) {
die "Usage: filter_scp.pl id_list [in.scp] > out.scp ";
}
$idlist = shift @ARGV;
open(F, "<$idlist") || die "Could not open id-list file $idlist";
while(<F>) {
@A = split;
@A>=1 || die "Invalid id-list file line $_";
$seen{$A[0]} = 1;
}
while(<>) {
@A = split;
@A > 0 || die "Invalid scp file line $_";
if($seen{$A[0]}) {
print $_;
}
}

69
egs/rm/s1/scripts/int2sym.pl Executable file
Просмотреть файл

@ -0,0 +1,69 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
$ignore_noninteger = 0;
$ignore_first_field = 0;
for($x = 0; $x < 2; $x++) {
if($ARGV[0] eq "--ignore-noninteger") { $ignore_oov = 1; shift @ARGV; }
if($ARGV[0] eq "--ignore-first-field") { $ignore_first_field = 1; shift @ARGV; }
}
$symtab = shift @ARGV;
if(!defined $symtab) {
die "Usage: sym2int.pl symtab [input transcriptions] > output transcriptions\n";
}
open(F, "<$symtab") || die "Error opening symbol table file $symtab";
while(<F>) {
@A = split(" ", $_);
@A == 2 || die "bad line in symbol table file: $_";
$int2sym{$A[1]} = $A[0];
}
$error = 0;
while(<>) {
@A = split(" ", $_);
if(@A == 0) {
die "Empty line in transcriptions input.";
}
if($ignore_first_field) {
$key = shift @A;
print $key . " ";
}
foreach $a (@A) {
if($a !~ m:^\d+$:) { # not all digits..
if($ignore_noninteger) {
print $a . " ";
next;
} else {
if($a eq $A[0]) {
die "int2sym.pl: found noninteger token $a (try --ignore-first-field)\n";
} else {
die "int2sym.pl: found noninteger token $a (try --ignore-noninteger if valid input)\n";
}
}
}
$s = $int2sym{$a};
if(!defined ($s)) {
die "int2sym.pl: integer $a not in symbol table $symtab.";
}
print $s . " ";
}
print "\n";
}

45
egs/rm/s1/scripts/is_sorted.sh Executable file
Просмотреть файл

@ -0,0 +1,45 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# Usage: is_sorted.sh [script-file]
# This script returns 0 (success) if the script file argument [or standard input]
# is sorted and 1 otherwise.
export LC_ALL=C
if [ $# == 0 ]; then
scp=-
fi
if [ $# == 1 ]; then
scp=$1
fi
if [ $# -gt 1 -o "$1" == "--help" -o "$1" == "-h" ]; then
echo "Usage: is_sorted.sh [script-file]"
exit 1
fi
cat $scp > /tmp/tmp1.$$
sort /tmp/tmp1.$$ > /tmp/tmp2.$$
cmp /tmp/tmp1.$$ /tmp/tmp2.$$ >/dev/null
ret=$?
rm /tmp/tmp1.$$ /tmp/tmp2.$$
if [ $ret == 0 ]; then
exit 0;
else
echo "is_sorted.sh: script file $scp is not sorted";
exit 1;
fi

Просмотреть файл

@ -0,0 +1,112 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# makes lexicon FST (no pron-probs involved).
if(@ARGV != 1 && @ARGV != 3) {
die "Usage: make_lexicon_fst.pl lexicon.txt [silprob silphone] > lexiconfst.txt"
}
$lexfn = shift @ARGV;
if(@ARGV == 0) {
$silprob = 0.0;
} else {
($silprob,$silphone) = @ARGV;
}
if($silprob != 0.0) {
$silprob < 1.0 || die "Sil prob cannot be >= 1.0";
$silcost = -log($silprob);
$nosilcost = -log(1.0 - $silprob);
}
open(L, "<$lexfn") || die "Error opening lexicon $lexfn";
if( $silprob == 0.0 ) { # No optional silences: just have one (loop+final) state which is numbered zero.
$loopstate = 0;
$nexststate = 1; # next unallocated state.
while(<L>) {
@A = split(" ", $_);
$w = shift @A;
if(@A == 0) { # For empty words (<s> and </s>) insert no optional
# silence (not needed as adjacent words supply it)....
# actually we only hit this case for the lexicon without disambig
# symbols but doesn't ever matter as training transcripts don't have <s> or </s>.
print "$loopstate\t$loopstate\t<eps>\t$w\n";
} else {
$s = $loopstate;
$word_or_eps = $w;
while (@A > 0) {
$p = shift @A;
if(@A > 0) {
$ns = $nextstate++;
} else {
$ns = $loopstate;
}
print "$s\t$ns\t$p\t$word_or_eps\n";
$word_or_eps = "<eps>";
$s = $ns;
}
}
}
print "$loopstate\t0\n"; # final-cost.
} else { # have silence probs.
$startstate = 0;
$loopstate = 1;
$silstate = 2; # state from where we go to loopstate after emitting silence.
$nextstate = 3;
print "$startstate\t$loopstate\t<eps>\t<eps>\t$nosilcost\n"; # no silence.
print "$startstate\t$loopstate\t$silphone\t<eps>\t$silcost\n"; # silence.
print "$silstate\t$loopstate\t$silphone\t<eps>\n"; # no cost.
while(<L>) {
@A = split(" ", $_);
$w = shift @A;
if(@A == 0) { # For empty words (<s> and </s>) insert no optional
# silence (not needed as adjacent words supply it)....
# actually we only hit this case for the lexicon without disambig
# symbols but doesn't ever matter as training transcripts don't have <s> or </s>.
print "$loopstate\t$loopstate\t<eps>\t$w\n";
} else {
$is_silence_word = (@A == 1 && $A[0] eq $silphone); # boolean.
$s = $loopstate;
$word_or_eps = $w;
while (@A > 0) {
$p = shift @A;
if(@A > 0) {
$ns = $nextstate++;
print "$s\t$ns\t$p\t$word_or_eps\n";
$word_or_eps = "<eps>";
$s = $ns;
} else {
if(! $is_silence_word) {
# This is non-deterministic but relatively compact,
# and avoids epsilons.
print "$s\t$loopstate\t$p\t$word_or_eps\t$nosilcost\n";
print "$s\t$silstate\t$p\t$word_or_eps\t$silcost\n";
} else {
# no point putting opt-sil after silence word.
print "$s\t$loopstate\t$p\t$word_or_eps\n";
}
$word_or_eps = "<eps>";
}
}
}
}
print "$loopstate\t0\n"; # final-cost.
}

Просмотреть файл

@ -0,0 +1,37 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# make_phones_symtab.pl < lexicon.txt > phones.txt
while(<>) {
@A = split(" ", $_);
for ($i=2; $i<@A; $i++) {
$P{$A[$i]} = 1; # seen it.
}
}
print "<eps>\t0\n";
$n = 1;
foreach $p (sort keys %P) {
if($p ne "<eps>") {
print "$p\t$n\n";
$n++;
}
}
print "sil\t$n\n";

130
egs/rm/s1/scripts/make_rm_dict.pl Executable file
Просмотреть файл

@ -0,0 +1,130 @@
#!/usr/bin/perl
# Copyright 2010-2011 Yanmin Qian Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# This file takes as input the file pcdsril.txt that comes with the RM
# distribution, and creates the dictionary used in RM training.
# make_rm_dct.pl pcdsril.txt > dct.txt
if (@ARGV != 1) {
die "usage: make_rm_dct.pl pcdsril.txt > dct.txt\n";
}
unless (open(IN_FILE, "@ARGV[0]")) {
die ("can't open @ARGV[0]");
}
while ($line = <IN_FILE>)
{
chop($line);
if (($line =~ /^[a-z]/))
{
$line =~ s/\+1//g;
@LineArray = split(/\s+/,$line);
@LineArray[0] = uc(@LineArray[0]);
printf "%-16s", @LineArray[0];
for ($i = 1; $i < @LineArray; $i ++)
{
if (@LineArray[$i] eq 'q')
{}
elsif (@LineArray[$i] eq 'zh')
{
printf "sh ";
}
elsif (@LineArray[$i] eq 'eng')
{
printf "ng ";
}
elsif (@LineArray[$i] eq 'hv')
{
printf "hh ";
}
elsif (@LineArray[$i] eq 'em')
{
printf "m ";
}
elsif (@LineArray[$i] eq 'axr')
{
printf "er ";
}
elsif (@LineArray[$i] eq 'tcl')
{
if (@LineArray[$i+1] ne 't')
{
printf "td ";
}
}
elsif (@LineArray[$i] eq 'dcl')
{
if (@LineArray[$i+1] ne 'd')
{
printf "dd ";
}
}
elsif (@LineArray[$i] eq 'kcl')
{
if (@LineArray[$i+1] ne 'k')
{
printf "kd ";
}
}
elsif (@LineArray[$i] eq 'pcl')
{
if (@LineArray[$i+1] ne 'p')
{
printf "pd ";
}
}
elsif (@LineArray[$i] eq 'bcl')
{
if (@LineArray[$i+1] ne 'b')
{
printf "b ";
}
}
elsif (@LineArray[$i] eq 'gcl')
{
if (@LineArray[$i+1] ne 'g')
{
printf "g ";
}
}
elsif (@LineArray[$i] eq 't')
{
if (@LineArray[$i+1] ne 's')
{
printf "@LineArray[$i] ";
}
else
{
printf "ts ";
$i++;
}
}
else
{
printf "@LineArray[$i] ";
}
}
printf "\n";
}
}
printf "!SIL sil\n";
close(IN_FILE);

119
egs/rm/s1/scripts/make_rm_lm.pl Executable file
Просмотреть файл

@ -0,0 +1,119 @@
#!/usr/bin/perl
# Copyright 2010-2011 Yanmin Qian Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# This file takes as input the file wp_gram.txt that comes with the RM
# distribution, and creates the language model as an acceptor in FST form.
# make_rm_lm.pl wp_gram.txt > G.txt
if (@ARGV != 1) {
print "usage: make_rm_lm.pl wp_gram.txt > G.txt\n";
exit(0);
}
unless (open(IN_FILE, "@ARGV[0]")) {
die ("can't open @ARGV[0]");
}
$flag = 0;
$count_wrd = 0;
$cnt_ends = 0;
$init = "";
while ($line = <IN_FILE>)
{
chop($line);
$line =~ s/ //g;
if(($line =~ /^>/))
{
if($flag == 0)
{
$flag = 1;
}
$line =~ s/>//g;
$hashcnt{$init} = $i;
$init = $line;
$i = 0;
$count_wrd++;
@LineArray[$count_wrd - 1] = $init;
$hashwrd{$init} = 0;
}
elsif($flag != 0)
{
$hash{$init}[$i] = $line;
$i++;
if($line =~ /SENTENCE-END/)
{
$cnt_ends++;
}
}
else
{}
}
$hashcnt{$init} = $i;
$num = 0;
$weight = 0;
$init_wrd = "SENTENCE-END";
$hashwrd{$init_wrd} = @LineArray;
for($i = 0; $i < $hashcnt{$init_wrd}; $i++)
{
$weight = -log(1/$hashcnt{$init_wrd});
$hashwrd{$hash{$init_wrd}[$i]} = $i + 1;
print "0 $hashwrd{$hash{$init_wrd}[$i]} $hash{$init_wrd}[$i] $hash{$init_wrd}[$i] $weight\n";
}
$num = $i;
for($i = 0; $i < @LineArray; $i++)
{
if(@LineArray[$i] eq 'SENTENCE-END')
{}
else
{
if($hashwrd{@LineArray[$i]} == 0)
{
$num++;
$hashwrd{@LineArray[$i]} = $num;
}
for($j = 0; $j < $hashcnt{@LineArray[$i]}; $j++)
{
$weight = -log(1/$hashcnt{@LineArray[$i]});
if($hashwrd{$hash{@LineArray[$i]}[$j]} == 0)
{
$num++;
$hashwrd{$hash{@LineArray[$i]}[$j]} = $num;
}
if($hash{@LineArray[$i]}[$j] eq 'SENTENCE-END')
{
print "$hashwrd{@LineArray[$i]} $hashwrd{$hash{@LineArray[$i]}[$j]} <eps> <eps> $weight\n"
}
else
{
print "$hashwrd{@LineArray[$i]} $hashwrd{$hash{@LineArray[$i]}[$j]} $hash{@LineArray[$i]}[$j] $hash{@LineArray[$i]}[$j] $weight\n";
}
}
}
}
print "$hashwrd{$init_wrd} 0\n";
close(IN_FILE);

102
egs/rm/s1/scripts/make_roots.pl Executable file
Просмотреть файл

@ -0,0 +1,102 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# Written by Dan Povey 9/21/2010. Apache 2.0 License.
# This version of make_roots.pl is specialized for RM.
# This script creates the file roots.txt which is an input to train-tree.cc. It
# specifies how the trees are built. The input file phone-sets.txt is a partial
# version of roots.txt in which phones are represented by their spelled form, not
# their symbol id's. E.g. at input, phone-sets.txt might contain;
# shared not-split sil
# Any phones not specified in phone-sets.txt but present in phones.txt will
# be given a default treatment. If the --separate option is given, we create
# a separate tree root for each of them, otherwise they are all lumped in one set.
# The arguments shared|not-shared and split|not-split are needed if any
# phones are not specified in phone-sets.txt. What they mean is as follows:
# if shared=="shared" then we share the tree-root between different HMM-positions
# (0,1,2). If split=="split" then we actually do decision tree splitting on
# that root, otherwise we forbid decision-tree splitting. (The main reason we might
# set this to false is for silence when
# we want to ensure that the HMM-positions will remain with a single PDF id.
$separate = 0;
if($ARGV[0] eq "--separate") {
$separate = 1;
shift @ARGV;
}
if(@ARGV != 4) {
die "Usage: make_roots.pl [--separate] phones.txt silence-phone-list[integer,colon-separated] shared|not-shared split|not-split > roots.txt\n";
}
($phonesfile, $silphones, $shared, $split) = @ARGV;
if($shared ne "shared" && $shared ne "not-shared") {
die "Third argument must be \"shared\" or \"not-shared\"\n";
}
if($split ne "split" && $split ne "not-split") {
die "Third argument must be \"split\" or \"not-split\"\n";
}
open(F, "<$phonesfile") || die "Opening file $phonesfile";
while(<F>) {
@A = split(" ", $_);
if(@A != 2) {
die "Bad line in phones symbol file: ".$_;
}
if($A[1] != 0) {
$symbol2id{$A[0]} = $A[1];
$id2symbol{$A[1]} = $A[0];
}
}
if($silphones == ""){
die "Empty silence phone list in make_roots.pl";
}
foreach $silphoneid (split(":", $silphones)) {
defined $id2symbol{$silphoneid} || die "No such silence phone id $silphoneid";
# Give each silence phone its own separate pdfs in each state, but
# no sharing (in this recipe; WSJ is different.. in this recipe there
#is only one silence phone anyway.)
$issil{$silphoneid} = 1;
print "not-shared not-split $silphoneid\n";
}
$idlist = "";
$remaining_phones = "";
if($separate){
foreach $a (keys %id2symbol) {
if(!defined $issil{$a}) {
print "$shared $split $a\n";
}
}
} else {
print "$shared $split ";
foreach $a (keys %id2symbol) {
if(!defined $issil{$a}) {
print "$a ";
}
}
print "\n";
}

Просмотреть файл

@ -0,0 +1,39 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# make_words_symtab.pl < G.txt > words.txt
while(<>) {
@A = split(" ", $_);
if(@A >= 3) {
$W{$A[2]} = 1;
}
}
print "<eps>\t0\n";
$n = 1;
foreach $w (sort keys %W) {
if($w ne "<eps>") {
print "$w\t$n\n";
$n++;
}
}
print "!SIL\t$n\n";

107
egs/rm/s1/scripts/mkgraph.sh Executable file
Просмотреть файл

@ -0,0 +1,107 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
reorder=true # Dan-style, make false for Mirko+Lukas's decoder.
for x in 1 2 3; do
if [ $1 == "--mono" ]; then
monophone_opts="--context-size=1 --central-position=0"
shift;
fi
if [ $1 == "--noreorder" ]; then
reorder=false # we set this for the Kaldi decoder.
shift;
fi
done
if [ $# != 3 ]; then
echo "Usage: scripts/mkgraph.sh <tree> <model> <graphdir>"
exit 1;
fi
if [ -f path.sh ]; then . path.sh; fi
tree=$1
model=$2
dir=$3
mkdir -p $dir
tscale=1.0
loopscale=0.1
fsttablecompose data/L_disambig.fst data/G.fst | fstdeterminizestar --use-log=true | \
fstminimizeencoded > $dir/LG.fst
fstisstochastic $dir/LG.fst || echo "warning: LG not stochastic."
echo "Example string from LG.fst: "
echo
fstrandgen --select=log_prob $dir/LG.fst | fstprint --isymbols=data/phones_disambig.txt --osymbols=data/words.txt -
grep '#' data/phones_disambig.txt | awk '{print $2}' > $dir/disambig_phones.list
fstcomposecontext $monophone_opts \
--read-disambig-syms=$dir/disambig_phones.list \
--write-disambig-syms=$dir/disambig_ilabels.list \
$dir/ilabels < $dir/LG.fst >$dir/CLG.fst
# for debugging:
fstmakecontextsyms data/phones.txt $dir/ilabels > $dir/context_syms.txt
echo "Example string from CLG.fst: "
echo
fstrandgen --select=log_prob $dir/CLG.fst | fstprint --isymbols=$dir/context_syms.txt --osymbols=data/words.txt -
fstisstochastic $dir/CLG.fst || echo "warning: CLG not stochastic."
make-ilabel-transducer --write-disambig-syms=$dir/disambig_ilabels_remapped.list $dir/ilabels $tree $model $dir/ilabels.remapped > $dir/ilabel_map.fst
# Reduce size of CLG by remapping symbols...
fsttablecompose $dir/ilabel_map.fst $dir/CLG.fst | fstdeterminizestar --use-log=true \
| fstminimizeencoded > $dir/CLG2.fst
cat $dir/CLG2.fst | fstisstochastic || echo "warning: CLG2 is not stochastic."
make-h-transducer --disambig-syms-out=$dir/disambig_tstate.list \
--transition-scale=$tscale $dir/ilabels.remapped $tree $model > $dir/Ha.fst
fsttablecompose $dir/Ha.fst $dir/CLG2.fst | fstdeterminizestar --use-log=true \
| fstrmsymbols $dir/disambig_tstate.list | fstrmepslocal | fstminimizeencoded > $dir/HCLGa.fst
fstisstochastic $dir/HCLGa.fst || echo "HCLGa is not stochastic"
add-self-loops --self-loop-scale=$loopscale --reorder=$reorder $model < $dir/HCLGa.fst > $dir/HCLG.fst
if [ $tscale == 1.0 -a $loopscale == 1.0 ]; then
# No point doing this test if transition-scale not 1, as it is bound to fail.
fstisstochastic $dir/HCLG.fst || echo "Final HCLG is not stochastic."
fi
fstisstochastic $dir/HCLG.fst || echo "Final HCLG is not stochastic."
#The next five lines are debug.
# The last two lines of this block print out some alignment info.
fstrandgen --select=log_prob $dir/HCLG.fst | fstprint --osymbols=data/words.txt > $dir/rand.txt
cat $dir/rand.txt | awk 'BEGIN{printf("0 ");} {if(NF>=3 && $3 != 0){ printf ("%d ",$3); }} END {print ""; }' > $dir/rand_align.txt
show-alignments data/phones.txt $model ark:$dir/rand_align.txt
cat $dir/rand.txt | awk ' {if(NF>=4 && $4 != "<eps>"){ printf ("%s ",$4); }} END {print ""; }'

115
egs/rm/s1/scripts/mkgraph_alt.sh Executable file
Просмотреть файл

@ -0,0 +1,115 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# This version of mkgraph.sh creates the C fst explicitly.
reorder=true # Dan-style, make false for Mirko+Lukas's decoder.
for x in 1 2 3; do
if [ $1 == "--mono" ]; then
monophone_opts="--context-size=1 --central-position=0"
shift;
fi
if [ $1 == "--noreorder" ]; then
reorder=false # we set this for the Kaldi decoder.
shift;
fi
done
if [ $# != 3 ]; then
echo "Usage: scripts/mkgraph.sh <tree> <model> <graphdir>"
exit 1;
fi
if [ -f path.sh ]; then . path.sh; fi
tree=$1
model=$2
dir=$3
mkdir -p $dir
tscale=1.0
loopscale=0.1
fsttablecompose data/L_disambig.fst data/G.fst | fstdeterminizestar --use-log=true | \
fstminimizeencoded > $dir/LG.fst
fstisstochastic $dir/LG.fst || echo "warning: LG not stochastic."
echo "Example string from LG.fst: "
echo
fstrandgen --select=log_prob $dir/LG.fst | fstprint --isymbols=data/phones_disambig.txt --osymbols=data/words.txt -
grep '#' data/phones_disambig.txt | awk '{print $2}' > $dir/disambig_phones.list
subseq_sym=`tail -1 data/phones_disambig.txt | awk '{print $2+1;}'`
cp data/phones_disambig.txt $dir/phones_disambig_subseq.txt
echo '$' $subseq_sym >> $dir/phones_disambig_subseq.txt
fstmakecontextfst --read-disambig-syms=$dir/disambig_phones.list \
--write-disambig-syms=$dir/disambig_ilabels.list data/phones.txt $subseq_sym \
$dir/ilabels | fstarcsort --sort_type=olabel > $dir/C.fst
fstaddsubsequentialloop $subseq_sym $dir/LG.fst | \
fsttablecompose $dir/C.fst - > $dir/CLG.fst
# for debugging:
fstmakecontextsyms data/phones.txt $dir/ilabels > $dir/context_syms.txt
echo "Example string from CLG.fst: "
echo
fstrandgen --select=log_prob $dir/CLG.fst | fstprint --isymbols=$dir/context_syms.txt --osymbols=data/words.txt -
fstisstochastic $dir/CLG.fst || echo "warning: CLG not stochastic."
make-ilabel-transducer --write-disambig-syms=$dir/disambig_ilabels_remapped.list $dir/ilabels $tree $model $dir/ilabels.remapped > $dir/ilabel_map.fst
# Reduce size of CLG by remapping symbols...
fstcompose $dir/ilabel_map.fst $dir/CLG.fst | fstdeterminizestar --use-log=true \
| fstminimizeencoded > $dir/CLG2.fst
cat $dir/CLG2.fst | fstisstochastic || echo "warning: CLG2 is not stochastic."
make-h-transducer --disambig-syms-out=$dir/disambig_tstate.list \
--transition-scale=$tscale $dir/ilabels.remapped $tree $model > $dir/Ha.fst
fsttablecompose $dir/Ha.fst $dir/CLG2.fst | fstdeterminizestar --use-log=true \
| fstrmsymbols $dir/disambig_tstate.list | fstrmepslocal | fstminimizeencoded > $dir/HCLGa.fst
fstisstochastic $dir/HCLGa.fst || echo "HCLGa is not stochastic"
add-self-loops --self-loop-scale=$loopscale --reorder=$reorder $model < $dir/HCLGa.fst > $dir/HCLG.fst
if [ $tscale == 1.0 -a $loopscale == 1.0 ]; then
# No point doing this test if transition-scale not 1, as it is bound to fail.
fstisstochastic $dir/HCLG.fst || echo "Final HCLG is not stochastic."
fi
fstisstochastic $dir/HCLG.fst || echo "Final HCLG is not stochastic."
#The next five lines are debug.
# The last two lines of this block print out some alignment info.
fstrandgen --select=log_prob $dir/HCLG.fst | fstprint --osymbols=data/words.txt > $dir/rand.txt
cat $dir/rand.txt | awk 'BEGIN{printf("0 ");} {if(NF>=3 && $3 != 0){ printf ("%d ",$3); }} END {print ""; }' > $dir/rand_align.txt
show-alignments data/phones.txt $model ark:$dir/rand_align.txt
cat $dir/rand.txt | awk ' {if(NF>=4 && $4 != "<eps>"){ printf ("%s ",$4); }} END {print ""; }'

Просмотреть файл

@ -0,0 +1,47 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# This script is part of a diagnostic step when using exponential transforms.
$map=$ARGV[0]; open(M,"<$map")||die "opening map file $map";
while(<M>){ @A=split(" ",$_); $map{$A[0]} = $A[1]; }
while(<STDIN>){
($spk,$warp)=split(" ",$_);
$class = int($class/2);
defined $map{$spk} || die "No gender info for speaker $spk";
$warps{$map{$spk}} = $warps{$map{$spk}} . "$warp ";
}
@K = sort keys %warps;
@K==2||die "wrong number of keys [empty warps file?]";
foreach $k ( @K ) {
$s = join(" ", sort { $a <=> $b } ( split(" ", $warps{$k}) )) ;
print "$k = [ $s ];\n";
}
# f,m may be reversed below; doesnt matter.
foreach $w ( split(" ", $warps{$K[0]}) ) {
$nf += 1; $sumf += $w; $sumf2 += $w*$w;
}
foreach $w ( split(" ", $warps{$K[1]}) ) {
$nm += 1; $summ += $w; $summ2 += $w*$w;
}
$sumf /= $nf; $sumf2 /= $nf;
$summ /= $nm; $summ2 /= $nm;
$sumf2 -= $sumf*$sumf;
$summ2 -= $summ*$summ;
$avgwithin = 0.5*($sumf2+$summ2 );
$diff = abs($sumf - $summ) / sqrt($avgwithin);
print "% class separation is $diff\n";

57
egs/rm/s1/scripts/silphones.pl Executable file
Просмотреть файл

@ -0,0 +1,57 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# creates integer lists of silence and non-silence phones in files,
# e.g. silphones.csl="1:2:3 \n"
# and nonsilphones.csl="4:5:6:7:...:24\n";
if(@ARGV != 4) {
die "Usage: silphones.pl phones.txt \"sil1 sil2 sil3\" silphones.csl nonsilphones.csl";
}
($symtab, $sillist, $silphones, $nonsilphones) = @ARGV;
open(S,"<$symtab") || die "Opening symbol table $symtab";
foreach $s (split(" ", $sillist)) {
$issil{$s} = 1;
}
@sil = ();
@nonsil = ();
while(<S>){
@A = split(" ", $_);
@A == 2 || die "Bad line $_ in phone-symbol-table file $symtab";
($sym, $int) = @A;
if($int != 0) {
if($issil{$sym}) { push @sil, $int; $seensil{$sym}=1; }
else { push @nonsil, $int; }
}
}
foreach $k(keys %issil) {
if(!$seensil{$k}) { die "No such silence phone $k"; }
}
open(F, ">$silphones") || die "opening silphones file $silphones";
open(G, ">$nonsilphones") || die "opening nonsilphones file $nonsilphones";
print F join(":", @sil) . "\n";
print G join(":", @nonsil) . "\n";
close(F);
close(G);
if(@sil == 0) { print STDERR "Warning: silphones.pl no silence phones.\n" }
if(@nonsil == 0) { print STDERR "Warning: silphones.pl no non-silence phones.\n" }

Просмотреть файл

@ -0,0 +1,27 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
while(<>){
@A = split(" ", $_);
@A > 1 || die "Invalid line in spk2utt file: $_";
$s = shift @A;
foreach $u ( @A ) {
print "$u $s\n";
}
}

181
egs/rm/s1/scripts/split_scp.pl Executable file
Просмотреть файл

@ -0,0 +1,181 @@
#!/usr/bin/perl -w
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# This program splits up any kind of .scp or archive-type file.
# If there is no utt2spk option it will work on any text file and
# will split it up with an approximately equal number of lines in
# each but.
# With the --utt2spk option it will work on anything that has the
# utterance-id as the first entry on each line; the utt2spk file is
# of the form "utterance speaker" (on each line).
# It splits it into equal size chunks as far as it can. If you use
# the utt2spk option it will make sure these chunks coincide with
# speaker boundaries. In this case, if there are more chunks
# than speakers (and in some other circumstances), some of the
# resulting chunks will be empty and it
# will print a warning.
# You will normally call this like:
# split_scp.pl scp scp.1 scp.2 scp.3 ...
# or
# split_scp.pl --utt2spk=utt2spk scp scp.1 scp.2 scp.3 ...
# Note that you can use this script to split the utt2spk file itself,
# e.g. split_scp.pl --utt2spk=utt2spk utt2spk utt2spk.1 utt2spk.2 ...
if(@ARGV < 2 ) {
die "Usage: split_scp.pl [--utt2spk=<utt2spk_file>] in.scp out1.scp out2.scp ... ";
}
if($ARGV[0] =~ m:^-:) {
# Everything inside this block
# corresponds to what we do when the --utt2spk option is used.
$opt = shift @ARGV;
@A = split("=", $opt);
if(@A != 2 || $A[0] ne "--utt2spk") {
die "split_scp.pl: invalid option $ARGV[0]";
}
$utt2spk_file = $A[1];
open(U, "<$utt2spk_file") || die "Failed to open utt2spk file $utt2spk_file";
while(<U>) {
@A = split;
@A == 2 || die "Bad line $_ in utt2spk file $utt2spk_file";
($u,$s) = @A;
$utt2spk{$u} = $s;
}
$inscp = shift @ARGV;
open(I, "<$inscp") || die "Opening input scp file $inscp";
@spkrs = ();
while(<I>) {
@A = split;
if(@A == 0) { die "Empty or space-only line in scp file $inscp"; }
$u = $A[0];
$s = $utt2spk{$u};
if(!defined $s) { die "No such utterance $u in utt2spk file $utt2spk_file"; }
if(!defined $spk_count{$s}) {
push @spkrs, $s;
$spk_count{$s} = 0;
$spk_data{$s} = "";
}
$spk_count{$s}++;
$spk_data{$s} = $spk_data{$s} . $_;
}
# Now split as equally as possible ..
# First allocate spks to files by given approximately
# equal #spks.
$numspks = @spkrs; # number of speakers.
$numscps = @ARGV; # number of output files.
$spksperscp = int( ($numspks+($numscps-1)) / $numscps); # the +$(numscps-1) forces rounding up.
for($scpidx = 0; $scpidx < $numscps; $scpidx++) {
$scparray[$scpidx] = []; # [] is array reference.
for($n = $spksperscp * $scpidx;
$n < $numspks && $n < $spksperscp*($scpidx+1);
$n++) {
$spk = $spkrs[$n];
push @{$scparray[$scpidx]}, $spk;
$scpcount[$scpidx] += $spk_count{$spk};
}
}
# Now will try to reassign beginning + ending speakers
# to different scp's and see if it gets more balanced.
# Suppose objf we're minimizing is sum_i (num utts in scp[i] - average)^2.
# We can show that if considering changing just 2 scp's, we minimize
# this by minimizing the squared difference in sizes. This is
# equivalent to minimizing the absolute difference in sizes. This
# shows this method is bound to converge.
$changed = 1;
while($changed) {
$changed = 0;
for($scpidx = 0; $scpidx < $numscps; $scpidx++) {
# First try to reassign ending spk of this scp.
if($scpidx < $numscps-1) {
$sz = @{$scparray[$scpidx]};
if($sz > 0) {
$spk = $scparray[$scpidx]->[$sz-1];
$count = $spk_count{$spk};
$nutt1 = $scpcount[$scpidx];
$nutt2 = $scpcount[$scpidx+1];
if( abs( ($nutt2+$count) - ($nutt1-$count))
< abs($nutt2 - $nutt1)) { # Would decrease
# size-diff by reassigning spk...
$scpcount[$scpidx+1] += $count;
$scpcount[$scpidx] -= $count;
pop @{$scparray[$scpidx]};
unshift @{$scparray[$scpidx+1]}, $spk;
$changed = 1;
}
}
}
if($scpidx > 0 && @{$scparray[$scpidx]} > 0) {
$spk = $scparray[$scpidx]->[0];
$count = $spk_count{$spk};
$nutt1 = $scpcount[$scpidx-1];
$nutt2 = $scpcount[$scpidx];
if( abs( ($nutt2-$count) - ($nutt1+$count))
< abs($nutt2 - $nutt1)) { # Would decrease
# size-diff by reassigning spk...
$scpcount[$scpidx-1] += $count;
$scpcount[$scpidx] -= $count;
shift @{$scparray[$scpidx]};
push @{$scparray[$scpidx-1]}, $spk;
$changed = 1;
}
}
}
}
# Now print out the files...
for($scpidx = 0; $scpidx < $numscps; $scpidx++) {
$scpfn = $ARGV[$scpidx];
open(F, ">$scpfn") || die "Could not open scp file $scpfn for writing.";
$count = 0;
if(@{$scparray[$scpidx]} == 0) {
print STDERR "Warning: split_scp.pl producing empty .scp file $scpfn (too many splits and too few speakers?)";
}
foreach $spk ( @{$scparray[$scpidx]} ) {
print F $spk_data{$spk};
$count += $spk_count{$spk};
}
if($count != $scpcount[$scpidx]) { die "Count mismatch [code error]"; }
close(F);
}
} else {
# This block is the "normal" case where there is no --utt2spk
# option and we just break into equal size chunks.
$inscp = shift @ARGV;
open(I, "<$inscp") || die "Opening input scp file $inscp";
$numscps = @ARGV; # size of array.
@F = ();
while(<I>) {
push @F, $_;
}
$numlines = @F;
if($numlines == 0) {
print STDERR "split_scp.pl: warning: empty input scp file $inscp";
}
$linesperscp = int( ($numlines+($numscps-1)) / $numscps); # the +$(numscps-1) forces rounding up.
# [just doing int() rounds down].
for($scpidx = 0; $scpidx < @ARGV; $scpidx++) {
$scpfile = $ARGV[$scpidx];
open(O, ">$scpfile") || die "Opening output scp file $scpfile";
for($n = $linesperscp * $scpidx; $n < $numlines && $n < $linesperscp*($scpidx+1); $n++) {
print O $F[$n];
}
close(O) || die "Closing scp file $scpfile";
}
}

59
egs/rm/s1/scripts/subset_scp.pl Executable file
Просмотреть файл

@ -0,0 +1,59 @@
#!/usr/bin/perl -w
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# This program selects a subset of N elements in the scp.
# It selects them evenly from throughout the scp, in order to
# avoid selecting too many from the same speaker.
# It prints them on the standard output.
if(@ARGV < 2 ) {
die "Usage: subset_scp.pl N in.scp ";
}
$N = shift @ARGV;
if($N == 0) {
die "First command-line parameter to subset_scp.pl must be an integer, got \"$N\"";
}
$inscp = shift @ARGV;
open(I, "<$inscp") || die "Opening input scp file $inscp";
@F = ();
while(<I>) {
push @F, $_;
}
$numlines = @F;
if($N > $numlines) {
die "You requested from subset_scp.pl more elements than available: $N > $numlines";
}
sub select_n {
my ($start,$end,$num_needed) = @_;
my $diff = $end - $start;
if($num_needed > $diff) { die "select_n: code error"; }
if($diff == 1 ) {
if($num_needed > 0) {
print $F[$start];
}
} else {
my $halfdiff = int($diff/2);
my $halfneeded = int($num_needed/2);
select_n($start, $start+$halfdiff, $halfneeded);
select_n($start+$halfdiff, $end, $num_needed - $halfneeded);
}
}
select_n(0, $numlines, $N);

59
egs/rm/s1/scripts/sym2int.pl Executable file
Просмотреть файл

@ -0,0 +1,59 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
$ignore_oov = 0;
$ignore_first_field = 0;
for($x = 0; $x < 2; $x++) {
if($ARGV[0] eq "--ignore-oov") { $ignore_oov = 1; shift @ARGV; }
if($ARGV[0] eq "--ignore-first-field") { $ignore_first_field = 1; shift @ARGV; }
}
$symtab = shift @ARGV;
if(!defined $symtab) {
die "Usage: sym2int.pl symtab [input transcriptions] > output transcriptions\n";
}
open(F, "<$symtab") || die "Error opening symbol table file $symtab";
while(<F>) {
@A = split(" ", $_);
@A == 2 || die "bad line in symbol table file: $_";
$sym2int{$A[0]} = $A[1] + 0;
}
while(<>) {
@A = split(" ", $_);
if(@A == 0) {
die "Empty line in transcriptions input.";
}
if($ignore_first_field) {
$key = shift @A;
print $key . " ";
}
foreach $a (@A) {
$i = $sym2int{$a};
if(!defined ($i)) {
if($ignore_oov) {
print $a . " " ;
} else {
die "sym2int.pl: undefined symbol $a\n";
}
}
print $i . " ";
}
print "\n";
}

Просмотреть файл

@ -0,0 +1,33 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
while(<>){
@A = split(" ", $_);
@A == 2 || die "Invalid line in utt2spk file: $_";
($u,$s) = @A;
if(!$seen_spk{$s}) {
$seen_spk{$s} = 1;
push @spklist, $s;
}
$uttlist{$s} = $uttlist{$s} . "$u ";
}
foreach $s (@spklist) {
$l = $uttlist{$s};
$l =~ s: $::; # remove trailing space.
print "$s $l\n";
}

45
egs/rm/s1/steps/decode_mono.sh Executable file
Просмотреть файл

@ -0,0 +1,45 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# Monophone decoding script.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_mono
tree=exp/mono/tree
mkdir -p $dir
model=exp/mono/final.mdl
graphdir=exp/graph_mono
scripts/mkgraph.sh --mono $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

45
egs/rm/s1/steps/decode_sgmm.sh Executable file
Просмотреть файл

@ -0,0 +1,45 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# SGMM decoding; use a different acoustic scale from normal (0.1 vs 0.08333)
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_sgmm
tree=exp/sgmm/tree
model=exp/sgmm/final.mdl
graphdir=exp/graph_sgmm
mkdir -p $dir
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
sgmm-decode-faster --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

54
egs/rm/s1/steps/decode_sgmm2.sh Executable file
Просмотреть файл

@ -0,0 +1,54 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# SGMM decoding; use a different acoustic scale from normal (0.1 vs 0.08333)
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_sgmm2
tree=exp/sgmm/tree
model=exp/sgmm2/final.mdl
graphdir=exp/graph_sgmm2
mkdir -p $dir
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
sgmm-gselect $model "$feats" ark,t:- 2>$dir/gselect_${test}.log | gzip -c > $dir/gselect_${test}.gz || exit 1;
gselect_opt="--gselect-read=ark:gunzip -c $dir/gselect_${test}.gz|"
sgmm-decode-faster-spkvecs --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt "$gselect_opt" $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log || exit 1;
ali-to-post $dir/test_${test}.ali $dir/test_${test}.post 2> $dir/post_${test}.log || exit 1;
gselect_opt="--gselect=ark:gunzip -c $dir/gselect_${test}.gz|"
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
sgmm-est-spkvecs "$gselect_opt" --spk2utt= $model "$feats" $dir/test_${test}.post $dir/vecs_${test} 2> $dir/est_spkvecs_${test}.log || exit 1;
sgmm-decode-faster-spkvecs --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt "$gselect_opt" --spkvecs-read=$dir/vecs_${test} $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_vecs_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

45
egs/rm/s1/steps/decode_sgmma.sh Executable file
Просмотреть файл

@ -0,0 +1,45 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# SGMM decoding; use a different acoustic scale from normal (0.1 vs 0.08333)
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_sgmma
tree=exp/sgmma/tree
model=exp/sgmma/final.mdl
graphdir=exp/graph_sgmma
mkdir -p $dir
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
sgmm-decode-faster --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

69
egs/rm/s1/steps/decode_sgmmb.sh Executable file
Просмотреть файл

@ -0,0 +1,69 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# SGMM decoding with adaptation.
#
# SGMM decoding; use a different acoustic scale from normal (0.1 vs 0.08333)
# (1) decode with "alignment model"
# (2) get GMM posteriors with "alignment model" and estimate speaker
# vectors with final model
# (3) decode with final model.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_sgmmb
tree=exp/sgmmb/tree
model=exp/sgmmb/final.mdl
alimodel=exp/sgmmb/final.alimdl
graphdir=exp/graph_sgmmb
silphonelist=`cat data/silphones.csl`
mkdir -p $dir
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
spk2utt_opt="--spk2utt=ark:data/test_${test}.spk2utt"
utt2spk_opt="--utt2spk=ark:data/test_${test}.utt2spk"
sgmm-gselect $model "$feats" ark,t:- 2>$dir/gselect.log | \
gzip -c > $dir/${test}_gselect.gz || exit 1;
gselect_opt="--gselect=ark:gunzip -c $dir/${test}_gselect.gz|"
# Use smaller beam first time.
sgmm-decode-faster "$gselect_opt" --beam=15.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt $alimodel $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.pre_tra ark,t:$dir/test_${test}.pre_ali 2> $dir/predecode_${test}.log
( ali-to-post ark:$dir/test_${test}.pre_ali ark:- | \
weight-silence-post 0.01 $silphonelist $alimodel ark:- ark:- | \
sgmm-post-to-gpost "$gselect_opt" $alimodel "$feats" ark,s,cs:- ark:- | \
sgmm-est-spkvecs-gpost "$spk2utt_opt" $model "$feats" ark,s,cs:- \
ark:$dir/test_${test}.vecs ) 2>$dir/vecs_${test}.log
sgmm-decode-faster $utt2spk_opt --spk-vecs=ark:$dir/test_${test}.vecs --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

44
egs/rm/s1/steps/decode_tri1.sh Executable file
Просмотреть файл

@ -0,0 +1,44 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri1
tree=exp/tri1/tree
model=exp/tri1/final.mdl
graphdir=exp/graph_tri1
mkdir -p $dir
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

Просмотреть файл

@ -0,0 +1,65 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# deocde_tri_fmllr.sh is as decode_tri.sh but estimating fMLLR in test,
# per speaker. There is no SAT.
# To be run from ..
if [ -f path.sh ]; then . path.sh; fi
srcdir=exp/decode_tri1
dir=exp/decode_tri1_fmllr
mkdir -p $dir
model=exp/tri1/final.mdl
tree=exp/tri1/tree
graphdir=exp/graph_tri1
silphones=`cat data/silphones.csl`
mincount=500 # mincount before we estimate a transform.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
# Comment the two lines below to make this per-utterance.
# This would only work if $srcdir was also per-utterance [otherwise
# you'd have to mess with the script a bit].
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
sifeats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
weight-silence-post 0.01 $silphones $model ark:- ark:- | \
gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
"$sifeats" ark,o:- ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra > $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

Просмотреть файл

@ -0,0 +1,69 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# deocde_tri_regtree_fmllr.sh is as ../decode_tri.sh but estimating fMLLR in test,
# per speaker. There is no SAT. Use a regression-tree with top-level speech/sil
# split (no silence weighting).
if [ -f path.sh ]; then . path.sh; fi
srcdir=exp/decode_tri1
dir=exp/decode_tri1_regtree_fmllr
mkdir -p $dir
model=exp/tri1/final.mdl
occs=exp/tri1/final.occs
tree=exp/tri1/tree
graphdir=exp/graph_tri1
silphones=`cat data/silphones.csl`
regtree=$dir/regtree
maxleaves=8 # max # of regression-tree leaves.
mincount=5000 # mincount before we add new transform.
gmm-make-regtree --sil-phones=$silphones --state-occs=$occs --max-leaves=$maxleaves $model $regtree 2>$dir/make_regtree.out
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
# Comment the two lines below to make this per-utterance.
# This would only work if $srcdir was also per-utterance [otherwise
# you'd have to mess with the script a bit].
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
# To deweight silence, would add the line
# weight-silence-post 0.0 $silphones $model ark:- ark:- | \
# after the line with ali-to-post
# This is useful if we don't treat silence specially when building regression tree.
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
gmm-est-regtree-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model "$feats" ark:- $regtree ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
gmm-decode-faster-regtree-fmllr $utt2spk_opt --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst $regtree "$feats" ark:$dir/${test}.fmllr ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra > $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

44
egs/rm/s1/steps/decode_tri2a.sh Executable file
Просмотреть файл

@ -0,0 +1,44 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2a
mkdir -p $dir
model=exp/tri2a/final.mdl
tree=exp/tri2a/tree
graphdir=exp/graph_tri2a
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

Просмотреть файл

@ -0,0 +1,65 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# deocde_tri_fmllr.sh is as decode_tri.sh but estimating fMLLR in test,
# per speaker. There is no SAT.
# To be run from ..
if [ -f path.sh ]; then . path.sh; fi
srcdir=exp/decode_tri2a
dir=exp/decode_tri2a_fmllr
mkdir -p $dir
model=exp/tri2a/final.mdl
tree=exp/tri2a/tree
graphdir=exp/graph_tri2a
silphones=`cat data/silphones.csl`
mincount=500 # mincount before we estimate a transform.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
# Comment the two lines below to make this per-utterance.
# This would only work if $srcdir was also per-utterance [otherwise
# you'd have to mess with the script a bit].
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
sifeats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
weight-silence-post 0.01 $silphones $model ark:- ark:- | \
gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
"$sifeats" ark,o:- ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra > $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

Просмотреть файл

@ -0,0 +1,65 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# deocde_tri_fmllr.sh is as decode_tri.sh but estimating fMLLR in test,
# per speaker. There is no SAT.
# To be run from ..
if [ -f path.sh ]; then . path.sh; fi
srcdir=exp/decode_tri2a
dir=exp/decode_tri2a_fmllr_utt
mkdir -p $dir
model=exp/tri2a/final.mdl
tree=exp/tri2a/tree
graphdir=exp/graph_tri2a
silphones=`cat data/silphones.csl`
mincount=500 # mincount before we estimate a transform.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
# Comment the two lines below to make this per-utterance.
# This would only work if $srcdir was also per-utterance [otherwise
# you'd have to mess with the script a bit].
#spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
#utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
sifeats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
weight-silence-post 0.01 $silphones $model ark:- ark:- | \
gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
"$sifeats" ark,o:- ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra > $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

67
egs/rm/s1/steps/decode_tri2b.sh Executable file
Просмотреть файл

@ -0,0 +1,67 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2b
mkdir -p $dir
model=exp/tri2b/final.mdl
alignmodel=exp/tri2b/final.alimdl
et=exp/tri2b/final.et
defaultmat=exp/tri2b/default.mat
tree=exp/tri2b/tree
graphdir=exp/graph_tri2b
silphones=`cat data/silphones.csl`
# already made the graph.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
defaultfeats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:-|"
sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
# First do SI decoding with alignment model.
# Use smaller beam for this, as less critical.
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
# Comment the two lines below to make this per-utterance.
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1 $model $et \
"$sifeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
2>$dir/et_${test}.log || exit 1;
feats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

62
egs/rm/s1/steps/decode_tri2c.sh Executable file
Просмотреть файл

@ -0,0 +1,62 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# Decode the testing data.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2c
mkdir -p $dir
model=exp/tri2c/final.mdl
tree=exp/tri2c/tree
graphdir=exp/graph_tri2c
# Note, the following 3 options must match the same options in train_tri2c.sh
norm_vars=false
after_deltas=false
per_spk=true
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
if [ $per_spk == "true" ]; then
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
fi # else empty.
echo "Computing cepstral mean and variance stats."
# compute mean and variance stats.
if [ $after_deltas == true ]; then
add-deltas --print-args=false scp:data/test_${test}.scp ark:- | compute-cmvn-stats $spk2utt_opt ark:- ark:$dir/cmvn_${test}ark 2>$dir/cmvn_${test}.log
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | apply-cmvn --norm-vars=$norm_vars $utt2spk_opt ark:$dir/cmvn_${test}ark ark:- ark:- |"
else
compute-cmvn-stats --spk2utt=ark:data/test_${test}.spk2utt scp:data/test_${test}.scp ark:$dir/cmvn_${test} 2>$dir/cmvn_${test}.log
feats="ark:apply-cmvn --norm-vars=$norm_vars $utt2spk_opt ark:$dir/cmvn_${test} scp:data/test_${test}.scp ark:- | add-deltas --print-args=false ark:- ark:- |"
fi
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra > $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

45
egs/rm/s1/steps/decode_tri2d.sh Executable file
Просмотреть файл

@ -0,0 +1,45 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2d
mkdir -p $dir
model=exp/tri2d/final.mdl
tree=exp/tri2d/tree
graphdir=exp/graph_tri2d
transform=exp/tri2d/final.mat
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

45
egs/rm/s1/steps/decode_tri2e.sh Executable file
Просмотреть файл

@ -0,0 +1,45 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2e
mkdir -p $dir
model=exp/tri2e/final.mdl
tree=exp/tri2e/tree
graphdir=exp/graph_tri2e
transform=exp/tri2e/final.mat
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

45
egs/rm/s1/steps/decode_tri2f.sh Executable file
Просмотреть файл

@ -0,0 +1,45 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2f
mkdir -p $dir
model=exp/tri2f/final.mdl
tree=exp/tri2f/tree
graphdir=exp/graph_tri2f
transform=exp/tri2f/final.mat
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

65
egs/rm/s1/steps/decode_tri2g.sh Executable file
Просмотреть файл

@ -0,0 +1,65 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2g
mkdir -p $dir
model=exp/tri2g/final.mdl
alignmodel=exp/tri2g/final.alimdl
lvtln=exp/tri2g/final.lvtln
tree=exp/tri2g/tree
graphdir=exp/graph_tri2g
silphones=`cat data/silphones.csl`
# already made the graph.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
# First do SI decoding with alignment model.
# Use smaller beam for this, as less critical.
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
# Comment the two lines below to make this per-utterance.
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
gmm-est-lvtln-trans --verbose=1 $spk2utt_opt $model $lvtln \
"$sifeats" ark:- ark:$dir/lvtln_${test}.trans ark,t:$dir/lvtln_${test}.warp ) \
2>$dir/lvtln_${test}.log || exit 1;
feats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/lvtln_${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

Просмотреть файл

@ -0,0 +1,65 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2g_diag
mkdir -p $dir
model=exp/tri2g/final.mdl
alignmodel=exp/tri2g/final.alimdl
lvtln=exp/tri2g/final.lvtln
tree=exp/tri2g/tree
graphdir=exp/graph_tri2g
silphones=`cat data/silphones.csl`
# already made the graph.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
# First do SI decoding with alignment model.
# Use smaller beam for this, as less critical.
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
# Comment the two lines below to make this per-utterance.
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
gmm-est-lvtln-trans --norm-type=diag --verbose=1 $spk2utt_opt $model $lvtln \
"$sifeats" ark:- ark:$dir/lvtln_${test}.trans ark,t:$dir/lvtln_${test}.warp ) \
2>$dir/lvtln_${test}.log || exit 1;
feats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/lvtln_${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

Просмотреть файл

@ -0,0 +1,79 @@
# as decode_tri2g but using the feature-level VTLN
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# as opposed to the linear VTLN when decoding.
# Also computing a maximum-likelihood mean offset,
# for better comparability with LVTLN.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2g_vtln
mkdir -p $dir
vtlnmodel=exp/tri2g/final.vtlnmdl
lvtlnmodel=exp/tri2g/final.mdl
alignmodel=exp/tri2g/final.alimdl
lvtln=exp/tri2g/final.lvtln
tree=exp/tri2g/tree
graphdir=exp/graph_tri2g
silphones=`cat data/silphones.csl`
# Doesn't matter which model we use when making the graph
# (only the transitions and structure are used).
scripts/mkgraph.sh $tree $vtlnmodel $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
# First do SI decoding with alignment model.
# Use smaller beam for this, as less critical.
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
# Comment the two lines below to make this per-utterance.
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
gmm-est-lvtln-trans --verbose=1 $spk2utt_opt $lvtlnmodel $lvtln \
"$sifeats" ark:- ark:/dev/null ark,t:$dir/lvtln_${test}.warp ) \
2>$dir/lvtln_${test}.log || exit 1;
cat $dir/lvtln_${test}.warp | awk '{print $1, (0.85+0.01*$2);}' > $dir/${test}.factor
feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- |"
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-est-fmllr --fmllr-update-type=offset $spk2utt_opt $vtlnmodel "$feats" ark,o:- ark:$dir/${test}.trans ) 2>$dir/fmllr_${test}.log || exit 1;
feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $vtlnmodel $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

Просмотреть файл

@ -0,0 +1,79 @@
# as decode_tri2g but using the feature-level VTLN
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# as opposed to the linear VTLN when decoding.
# Also computing a diagonal fMLLR transform for
# comparison with ET.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2g_vtln_diag
mkdir -p $dir
vtlnmodel=exp/tri2g/final.vtlnmdl
lvtlnmodel=exp/tri2g/final.mdl
alignmodel=exp/tri2g/final.alimdl
lvtln=exp/tri2g/final.lvtln
tree=exp/tri2g/tree
graphdir=exp/graph_tri2g
silphones=`cat data/silphones.csl`
# Doesn't matter which model we use when making the graph
# (only the transitions and structure are used).
scripts/mkgraph.sh $tree $vtlnmodel $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
# First do SI decoding with alignment model.
# Use smaller beam for this, as less critical.
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
# Comment the two lines below to make this per-utterance.
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
gmm-est-lvtln-trans --verbose=1 $spk2utt_opt $lvtlnmodel $lvtln \
"$sifeats" ark:- ark:/dev/null ark,t:$dir/lvtln_${test}.warp ) \
2>$dir/lvtln_${test}.log || exit 1;
cat $dir/lvtln_${test}.warp | awk '{print $1, (0.85+0.01*$2);}' > $dir/${test}.factor
feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- |"
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-est-fmllr --fmllr-update-type=diag $spk2utt_opt $vtlnmodel "$feats" ark,o:- ark:$dir/${test}.trans ) 2>$dir/fmllr_${test}.log || exit 1;
feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $vtlnmodel $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

Просмотреть файл

@ -0,0 +1,71 @@
# as decode_tri2g but using the feature-level VTLN
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# as opposed to the linear VTLN when decoding.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2g_vtln_nofmllr
mkdir -p $dir
vtlnmodel=exp/tri2g/final.vtlnmdl
lvtlnmodel=exp/tri2g/final.mdl
alignmodel=exp/tri2g/final.alimdl
lvtln=exp/tri2g/final.lvtln
tree=exp/tri2g/tree
graphdir=exp/graph_tri2g
silphones=`cat data/silphones.csl`
# Doesn't matter which model we use when making the graph
# (only the transitions and structure are used).
scripts/mkgraph.sh $tree $vtlnmodel $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
# First do SI decoding with alignment model.
# Use smaller beam for this, as less critical.
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
# Comment the two lines below to make this per-utterance.
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
gmm-est-lvtln-trans --verbose=1 $spk2utt_opt $lvtlnmodel $lvtln \
"$sifeats" ark:- ark:/dev/null ark,t:$dir/lvtln_${test}.warp ) \
2>$dir/lvtln_${test}.log || exit 1;
cat $dir/lvtln_${test}.warp | awk '{print $1, (0.85+0.01*$2);}' > $dir/${test}.factor
feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $vtlnmodel $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

45
egs/rm/s1/steps/decode_tri2h.sh Executable file
Просмотреть файл

@ -0,0 +1,45 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2h
mkdir -p $dir
model=exp/tri2h/final.mdl
tree=exp/tri2h/tree
graphdir=exp/graph_tri2h
transform=exp/tri2h/final.mat
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

45
egs/rm/s1/steps/decode_tri2i.sh Executable file
Просмотреть файл

@ -0,0 +1,45 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2i
mkdir -p $dir
model=exp/tri2i/final.mdl
tree=exp/tri2i/tree
graphdir=exp/graph_tri2i
transform=exp/tri2i/final.mat
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
feats="ark:add-deltas --delta-order=3 scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

44
egs/rm/s1/steps/decode_tri2j.sh Executable file
Просмотреть файл

@ -0,0 +1,44 @@
# to be run from ..
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2j
mkdir -p $dir
model=exp/tri2j/final.mdl
tree=exp/tri2j/tree
graphdir=exp/graph_tri2j
transform=exp/tri2j/final.mat
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
feats="ark:add-deltas --delta-order=3 scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

68
egs/rm/s1/steps/decode_tri2k.sh Executable file
Просмотреть файл

@ -0,0 +1,68 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2k
mkdir -p $dir
model=exp/tri2k/final.mdl
alignmodel=exp/tri2k/final.alimdl
et=exp/tri2k/final.et
tree=exp/tri2k/tree
graphdir=exp/graph_tri2k
ldamat=exp/tri2k/lda.mat
defaultmat=exp/tri2k/default.mat
silphones=`cat data/silphones.csl`
# already made the graph.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
defaultfeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:- |"
sifeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- |"
# First do SI decoding with alignment model.
# Use smaller beam for this, as less critical.
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
# Comment the two lines below to make this per-utterance.
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1 $model $et \
"$sifeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
2>$dir/et_${test}.log || exit 1;
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output (cut off in mid-line) without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

Просмотреть файл

@ -0,0 +1,77 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2k_fmllr
mkdir -p $dir
model=exp/tri2k/final.mdl
alignmodel=exp/tri2k/final.alimdl
et=exp/tri2k/final.et
tree=exp/tri2k/tree
graphdir=exp/graph_tri2k
ldamat=exp/tri2k/lda.mat
defaultmat=exp/tri2k/default.mat
silphones=`cat data/silphones.csl`
# already made the graph.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
defaultfeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:- |"
basefeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- |"
# First do SI decoding with alignment model.
# Use smaller beam for this, as less critical.
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pass1.tra ark,t:$dir/test_${test}_pass1.ali 2> $dir/pass1decode_${test}.log
# Comment the two lines below to make this per-utterance.
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
( ali-to-post ark:$dir/test_${test}_pass1.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1 $model $et \
"$basefeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
2>$dir/et_${test}.log || exit 1;
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}_pass2.tra ark,t:$dir/test_${test}_pass2.ali 2> $dir/pass2decode_${test}.log
( ali-to-post ark:$dir/test_${test}_pass2.ali ark:- | \
weight-silence-post 0.0 $silphones $model ark:- ark:- | \
gmm-est-fmllr $spk2utt_opt $model "$feats" ark:- ark:$dir/fmllr_${test}.trans ) \
2>$dir/fmllr_${test}.log || exit 1;
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/fmllr_${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

Просмотреть файл

@ -0,0 +1,80 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2k_regtree_fmllr
mkdir -p $dir
model=exp/tri2k/final.mdl
alignmodel=exp/tri2k/final.alimdl
et=exp/tri2k/final.et
tree=exp/tri2k/tree
graphdir=exp/graph_tri2k
ldamat=exp/tri2k/lda.mat
defaultmat=exp/tri2k/default.mat
silphones=`cat data/silphones.csl`
occs=exp/tri2k/final.occs
regtree=$dir/regtree
maxleaves=8 # max # of regression-tree leaves.
mincount=5000 # mincount before we add new transform.
gmm-make-regtree --sil-phones=$silphones --state-occs=$occs --max-leaves=$maxleaves $model $regtree 2>$dir/make_regtree.out
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
defaultfeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:- |"
basefeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- |"
# First do SI decoding with alignment model.
# Use smaller beam for this, as less critical.
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pass1.tra ark,t:$dir/test_${test}_pass1.ali 2> $dir/pass1decode_${test}.log
# Comment the two lines below to make this per-utterance.
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
( ali-to-post ark:$dir/test_${test}_pass1.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1 $model $et \
"$basefeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
2>$dir/et_${test}.log || exit 1;
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}_pass2.tra ark,t:$dir/test_${test}_pass2.ali 2> $dir/pass2decode_${test}.log
( ali-to-post ark:$dir/test_${test}_pass2.ali ark:- | \
weight-silence-post 0.0 $silphones $model ark:- ark:- | \
gmm-est-regtree-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model "$feats" ark:- $regtree ark:$dir/${test}.fmllr ) \
2>$dir/fmllr_${test}.log || exit 1;
gmm-decode-faster-regtree-fmllr $utt2spk_opt --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst $regtree "$feats" ark:$dir/${test}.fmllr ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

Просмотреть файл

@ -0,0 +1,68 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2k_utt
mkdir -p $dir
model=exp/tri2k/final.mdl
alignmodel=exp/tri2k/final.alimdl
et=exp/tri2k/final.et
tree=exp/tri2k/tree
graphdir=exp/graph_tri2k
ldamat=exp/tri2k/lda.mat
defaultmat=exp/tri2k/default.mat
silphones=`cat data/silphones.csl`
# already made the graph.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
defaultfeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:- |"
sifeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- |"
# First do SI decoding with alignment model.
# Use smaller beam for this, as less critical.
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
# Comment the two lines below to make this per-utterance.
#spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
#utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1 $model $et \
"$sifeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
2>$dir/et_${test}.log || exit 1;
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output (cut off in mid-line) without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

61
egs/rm/s1/steps/decode_tri2l.sh Executable file
Просмотреть файл

@ -0,0 +1,61 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2l
mkdir -p $dir
model=exp/tri2l/final.mdl
alignmodel=exp/tri2l/final.alimdl
tree=exp/tri2l/tree
graphdir=exp/graph_tri2l
transform=exp/tri2l/final.mat
silphones=`cat data/silphones.csl`
mincount=500
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
sifeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
# Use smaller beam for 1st pass.
gmm-decode-faster --beam=17.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}.pre_tra ark,t:$dir/test_${test}.pre_ali 2> $dir/predecode_${test}.log
( ali-to-post ark:$dir/test_${test}.pre_ali ark:- | \
weight-silence-post 0.0 $silphones $model ark:- ark:- | \
gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
"$sifeats" ark,o:- ark:$dir/${test}.fmllr ) 2>$dir/fmllr_${test}.log
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

Просмотреть файл

@ -0,0 +1,61 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2l_utt
mkdir -p $dir
model=exp/tri2l/final.mdl
alignmodel=exp/tri2l/final.alimdl
tree=exp/tri2l/tree
graphdir=exp/graph_tri2l
transform=exp/tri2l/final.mat
silphones=`cat data/silphones.csl`
mincount=300
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
#spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
#utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
sifeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
# Use smaller beam for 1st pass.
gmm-decode-faster --beam=17.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}.pre_tra ark,t:$dir/test_${test}.pre_ali 2> $dir/predecode_${test}.log
( ali-to-post ark:$dir/test_${test}.pre_ali ark:- | \
weight-silence-post 0.0 $silphones $model ark:- ark:- | \
gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
"$sifeats" ark,o:- ark:$dir/${test}.fmllr ) 2>$dir/fmllr_${test}.log
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
cat $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer

Просмотреть файл

@ -0,0 +1,32 @@
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# Decode the testing data.
# this is the hardest test set, see
# http://www.itl.nist.gov/iad/mig/tests/rt/ASRhistory/pdf/resource_management_92eval.pdf
dir=exp/decode_tri_mixup
mkdir -p $dir
srcdir=exp/tri_mixup
model=$srcdir/25.mdl
graphdir=exp/graph_tri_mixup
../src/bin/faster-decode-gmm --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst data/test_sep92.scp $dir/word_transcripts.txt $dir/alignments.txt > $dir/decode.out
../src/bin/compute-wer --symbol-table=data/words.txt data_prep/test_sep92_trans.txt $dir/word_transcripts.txt > $dir/wer

48
egs/rm/s1/steps/init_sgmm.sh Executable file
Просмотреть файл

@ -0,0 +1,48 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# Initialize SGMM from a trained HMM/GMM system.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/sgmm/init
mkdir -p $dir
srcdir=exp/tri1
model=exp/sgmm/0.mdl
init-ubm --intermediate-numcomps=2000 --ubm-numcomps=400 --verbose=2 \
--fullcov-ubm=true $srcdir/final.mdl $srcdir/final.occs \
$dir/ubm0 2> $dir/cluster.log
subset[0]=1000
subset[1]=1500
subset[2]=2000
subset[3]=2500
for x in 0 1 2 3; do
echo "Pass $x"
feats="ark:scripts/subset_scp.pl ${subset[$x]} data/train.scp | add-deltas --print-args=false scp:- ark:- |"
fgmm-global-acc-stats --diag-gmm-nbest=15 --binary=false --verbose=2 $dir/ubm$x "$feats" $dir/$x.acc \
2> $dir/acc.$x.log || exit 1;
fgmm-global-est --verbose=2 $dir/ubm$x $dir/$x.acc \
$dir/ubm$[$x+1] 2> $dir/update.$x.log || exit 1;
rm $dir/$x.acc
done
sgmm-init $srcdir/final.mdl $dir/ubm4 $model 2> $dir/sgmm_init.log

Просмотреть файл

@ -0,0 +1,44 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from .. (one directory up from here)
if [ $# != 1 ]; then
echo "usage: make_mfcc_test.sh <abs-path-to-tmpdir>"
exit 1;
fi
if [ -f path.sh ]; then . path.sh; fi
dir=exp/make_mfcc
mkdir -p $dir
root_out=$1
mkdir -p $root_out
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
scpin=data_prep/test_${test}_wav.scp
# Making it like this so it works for others on the BUT filesystem.
# It will generate the correct scp file without running the feature extraction.
log=$dir/make_mfcc_test_${test}.log
(
compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf scp:$scpin ark,scp:$root_out/test_${test}_raw_mfcc.ark,$root_out/test_${test}_raw_mfcc.scp 2> $log || tail $log
cp $root_out/test_${test}_raw_mfcc.scp data/test_${test}.scp
) &
done
wait
echo "If the above produced no output on the screen, it succeeded."

Просмотреть файл

@ -0,0 +1,43 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from .. (one directory up from here)
if [ $# != 1 ]; then
echo "usage: make_mfcc_train.sh <abs-path-to-tmpdir>";
exit 1;
fi
if [ -f path.sh ]; then . path.sh; fi
scpin=data_prep/train_wav.scp
dir=exp/make_mfcc
mkdir -p $dir
root_out=$1
mkdir -p $root_out
scripts/split_scp.pl $scpin $dir/train_wav{1,2,3,4}.scp
for n in 1 2 3 4; do # Use 4 CPUs
log=$dir/make_mfcc_train.$n.log
compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf scp:$dir/train_wav${n}.scp ark,scp:$root_out/train_raw_mfcc${n}.ark,$root_out/train_raw_mfcc${n}.scp 2> $log || tail $log &
done
wait;
cat $root_out/train_raw_mfcc{1,2,3,4}.scp > data/train.scp
echo "If the above produced no output on the screen, it succeeded."

Просмотреть файл

@ -0,0 +1,66 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# The output of this script is the symbol tables data/{words.txt,phones.txt},
# and the grammars and lexicons data/{L,G}{,_disambig}.fst
# To be run from ..
if [ -f path.sh ]; then . path.sh; fi
cp data_prep/G.txt data/
scripts/make_words_symtab.pl < data/G.txt > data/words.txt
cp data_prep/lexicon.txt data/
scripts/make_phones_symtab.pl < data/lexicon.txt > data/phones.txt
silphones="sil"; # This would in general be a space-separated list of all silence phones. E.g. "sil vn"
# Generate colon-separated lists of silence and non-silence phones.
scripts/silphones.pl data/phones.txt "$silphones" data/silphones.csl data/nonsilphones.csl
ndisambig=`scripts/add_lex_disambig.pl data/lexicon.txt data/lexicon_disambig.txt`
scripts/add_disambig.pl data/phones.txt $ndisambig > data/phones_disambig.txt
# Create train transcripts in integer format:
cat data_prep/train_trans.txt | \
scripts/sym2int.pl --ignore-first-field data/words.txt > data/train.tra
# Get lexicon in FST format.
# silprob = 0.5: same prob as word.
scripts/make_lexicon_fst.pl data/lexicon.txt 0.5 sil | fstcompile --isymbols=data/phones.txt --osymbols=data/words.txt --keep_isymbols=false --keep_osymbols=false | fstarcsort --sort_type=olabel > data/L.fst
scripts/make_lexicon_fst.pl data/lexicon_disambig.txt 0.5 sil | fstcompile --isymbols=data/phones_disambig.txt --osymbols=data/words.txt --keep_isymbols=false --keep_osymbols=false | fstarcsort --sort_type=olabel > data/L_disambig.fst
fstcompile --isymbols=data/words.txt --osymbols=data/words.txt --keep_isymbols=false --keep_osymbols=false data/G.txt > data/G.fst
# Checking that G is stochastic [note, it wouldn't be for an Arpa]
fstisstochastic data/G.fst || echo Error
# Checking that disambiguated lexicon times G is determinizable
fsttablecompose data/L_disambig.fst data/G.fst | fstdeterminize >/dev/null || echo Error
# Checking that LG is stochastic:
fsttablecompose data/L.fst data/G.fst | fstisstochastic || echo Error
## Check lexicon.
## just have a look and make sure it seems sane.
fstprint --isymbols=data/phones.txt --osymbols=data/words.txt data/L.fst | head

121
egs/rm/s1/steps/train_et2.sh Executable file
Просмотреть файл

@ -0,0 +1,121 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
# train_et2.sh is as train_et.sh but using an adapt model with
# fewer Gaussians. Seeing if this makes the warp distribution more
# bimodal.
if [ -f path.sh ]; then . path.sh; fi
srcdir=exp/adapt2
dir=exp/et2
srcmodel=$srcdir/20.mdl
normtype=mean # could be mean or none or mean-and-var
spk2utt_opt=--spk2utt=ark:$dir/spk2utt
utt2spk_opt=--utt2spk=ark:$dir/utt2spk
# for per-utterance, uncomment the following [this would make it worse]:
# spk2utt_opt=
# utt2spk_opt=
feats="ark:add-deltas scp:$dir/train.scp ark:- |"
mkdir -p $dir
nspk=109 # Use all 109 RM training speakers.
nutt=15 # Use at most 15 utterances from each speaker.
head -$nspk data/train.spk2utt | \
awk '{ printf("%s ",$1); for(x=2; x<=NF&&x<='$nutt'+1;x++)
{ printf("%s ", $x); } printf("\n"); }' > $dir/spk2utt
scripts/spk2utt_to_utt2spk.pl < $dir/spk2utt > $dir/utt2spk
cat $dir/utt2spk | awk '{print $1}' > $dir/uttlist
scripts/filter_scp.pl $dir/uttlist <data/train.scp >$dir/train.scp
silphonelist=`cat data/silphones.csl`
cp $srcdir/tree $dir
cp $srcdir/phone_map $dir
# Use a subset of a training utts from srcdir, so we use the alignments from there:
# link these.
(
cd $dir
ln -s ../../$srcdir/cur.ali .
ln -s ../../$srcmodel 0.mdl
)
# Init the transform:
gmm-init-et --normalize-type=$normtype --binary=false --dim=39 $dir/0.et 2>$dir/init_et.log || exit 1
for x in 0 1 2 3 4 5 6 7 8 9 10 11; do
x1=$[$x+1];
# Work out current transforms:
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
gmm-post-to-gpost $srcmodel "$feats" ark:- ark:- | \
gmm-est-et $spk2utt_opt --verbose=1 $dir/$x.mdl $dir/$x.et "$feats" ark:- ark:$dir/$x.trans ark,t:$dir/$x.warp ) 2> $dir/trans.$x.log || exit 1;
# Accumulate stats to update model:
( transform-feats $utt2spk_opt ark:$dir/$x.trans "$feats" ark:- 2>$dir/apply_fmllr.$x.log | \
gmm-acc-stats-twofeats $srcmodel "$feats" ark:- "ark:cat $dir/cur.ali | ali-to-post ark:- ark:- |" $dir/$x.acc ) 2>$dir/gmm_acc.$x.log || exit 1;
# Check likelihoods (must add the fMLLR determinants from apply_fmllr.$x.log, to get meaningful
# figures.)
( transform-feats $utt2spk_opt ark:$dir/$x.trans "$feats" ark:- | \
gmm-acc-stats $dir/$x.mdl ark:- "ark:cat $dir/cur.ali | ali-to-post ark:- ark:- |" /dev/null ) 2>$dir/gmm_getlike.$x.log || exit 1;
gmm-est --remove-low-count-gaussians=false $dir/$x.mdl $dir/$x.acc $dir/$x1.mdl 2>$dir/gmm_est.$x.log || exit 1;
# Next estimate either A or B, depending on iteration:
if [ $[$x%2] == 0 ]; then # Estimate A:
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
gmm-post-to-gpost $srcmodel "$feats" ark:- ark:- | \
gmm-et-acc-a $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$feats" ark:- $dir/$x.et_acc_a ) 2> $dir/acc_a.$x.log || exit 1;
gmm-et-est-a --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.et_acc_a 2> $dir/update_a.$x.log || exit 1;
rm $dir/$x.et_acc_a
else
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
gmm-post-to-gpost $srcmodel "$feats" ark:- ark:- | \
gmm-et-acc-b $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$feats" ark:- ark:$dir/$x.trans ark:$dir/$x.warp $dir/$x.et_acc_b 2> $dir/acc_b.$x.log || exit 1;
gmm-et-est-b --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.mat $dir/$x.et_acc_b ) 2> $dir/update_b.$x.log || exit 1;
rm $dir/$x.et_acc_b
# Careful!: gmm-transform-means here changes $x1.mdl in-place.
gmm-transform-means $dir/$x.mat $dir/$x1.mdl $dir/$x1.mdl 2> $dir/transform_means.$x.log
fi
rm $dir/$x.trans
if [ $x != 0 ]; then
rm $dir/$x.mdl # keep 0.mdl as it's the alignment model.
fi
rm $dir/$x.acc
x=$[$x+1];
done
for n in 0 1 2 3 4 5 6 7 8 9 10 11; do
cat $dir/$n.warp | scripts/process_warps.pl data/spk2gender.map > $dir/warps.$n
done

92
egs/rm/s1/steps/train_mono.sh Executable file
Просмотреть файл

@ -0,0 +1,92 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
if [ -f path.sh ]; then . path.sh; fi
# Train the monophone on a subset-- no point using all the data.
dir=exp/mono
n=1000
feats="ark:add-deltas --print-args=false scp:$dir/train.scp ark:- |"
# need to quote when passing as an argument, as in "$feats",
# since it has spaces in it.
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
numiters=30 # Number of iterations of training
maxiterinc=20 # Last iter to increase #Gauss on.
numgauss=250 # Initial num-Gauss (must be more than #states=3*phones).
totgauss=1000 # Target #Gaussians.
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
realign_iters="1 2 3 4 5 6 7 8 9 10 12 15 20 25";
mkdir -p $dir
scripts/subset_scp.pl $n data/train.scp > $dir/train.scp
silphones=`cat data/silphones.csl | sed 's/:/ /g'`
nonsilphones=`cat data/nonsilphones.csl | sed 's/:/ /g'`
cat conf/topo.proto | sed "s:NONSILENCEPHONES:$nonsilphones:" | sed "s:SILENCEPHONES:$silphones:" > $dir/topo
gmm-init-mono '--train-feats=ark:head -10 data/train.scp | add-deltas scp:- ark:- |' $dir/topo 39 $dir/0.mdl $dir/tree 2> $dir/init.out || exit 1;
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/0.mdl data/L.fst \
"ark:scripts/subset_scp.pl $n data/train.tra|" \
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
echo Pass 0
align-equal-compiled "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
ark,t,f:- 2>$dir/align.0.log | \
gmm-acc-stats-ali --binary=true $dir/0.mdl "$feats" ark:- \
$dir/0.acc 2> $dir/acc.0.log || exit 1;
# In the following steps, the --min-gaussian-occupancy=3 option is important, otherwise
# we fail to est "rare" phones and later on, they never align properly.
gmm-est --min-gaussian-occupancy=3 --mix-up=$numgauss \
$dir/0.mdl $dir/0.acc $dir/1.mdl 2> $dir/update.0.log || exit 1;
rm $dir/0.acc
beam=4 # will change to 8 below after 1st pass
x=1
while [ $x -lt $numiters ]; do
echo "Pass $x"
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=$beam --retry-beam=$[$beam*4] $dir/$x.mdl \
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" t,ark:$dir/cur.ali \
2> $dir/align.$x.log || exit 1;
fi
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
rm $dir/$x.mdl $dir/$x.acc
if [ $x -le $maxiterinc ]; then
numgauss=$[$numgauss+$incgauss];
fi
beam=8
x=$[$x+1]
done
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl )
# example of showing the alignments:
# show-alignments data/phones.txt $dir/30.mdl ark:$dir/cur.ali | head -4

87
egs/rm/s1/steps/train_sgmm1.sh Executable file
Просмотреть файл

@ -0,0 +1,87 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
if [ -f path.sh ]; then . path.sh; fi
# To be run from ..
dir=exp/sgmm
srcdir=exp/tri1
srcmodel=$srcdir/final.mdl
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
numiters=25 # Total number of iterations
realign_iters="5 10 15";
silphonelist=`cat data/silphones.csl`
numsubstates=1500 # Initial #-substates.
totsubstates=5000 # Target #-substates.
maxiterinc=15 # Last iter to increase #substates on.
incsubstates=$[($totsubstates-$numsubstates)/$maxiterinc] # per-iter increment for #substates
gselect_opt="--gselect=ark:gunzip -c $dir/gselect.gz|"
randprune=0.1
mkdir -p $dir
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
cp $srcdir/tree $dir
echo "aligning all training data"
if [ ! -f $dir/0.ali ]; then
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel "$srcgraphs" \
"$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
fi
if [ ! -f $dir/0.mdl ]; then
echo "you must run init_sgmm.sh before train_sgmm1.sh"
exit 1
fi
if [ ! -f $dir/gselect.gz ]; then
sgmm-gselect $dir/0.mdl "$feats" ark,t:- 2>$dir/gselect.log | gzip -c > $dir/gselect.gz || exit 1;
fi
cp $dir/0.ali $dir/cur.ali || exit 1;
iter=0
while [ $iter -lt $numiters ]; do
echo "Pass $iter ... "
if echo $realign_iters | grep -w $iter >/dev/null; then
echo "Aligning data"
sgmm-align-compiled $scale_opts "$gselect_opt" --beam=8 --retry-beam=40 $dir/$iter.mdl \
"$srcgraphs" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$iter.log || exit 1;
fi
if [ $iter -gt 0 ]; then
flags=vMwcS
else
flags=vwcS
fi
if [ ! -f $dir/$[$iter+1].mdl ]; then
sgmm-acc-stats-ali --update-flags=$flags "$gselect_opt" --rand-prune=$randprune --binary=false $dir/$iter.mdl "$feats" ark:$dir/cur.ali $dir/$iter.acc 2> $dir/acc.$iter.log || exit 1;
sgmm-est --update-flags=$flags --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
fi
# rm $dir/$iter.mdl $dir/$iter.acc
# rm $dir/$iter.occs
if [ $iter -lt $maxiterinc ]; then
numsubstates=$[$numsubstates+$incsubstates]
fi
iter=$[$iter+1];
done
( cd $dir; rm final.mdl final.occs 2>/dev/null; ln -s $iter.mdl final.mdl; ln -s $iter.occs final.occs )

103
egs/rm/s1/steps/train_sgmm2.sh Executable file
Просмотреть файл

@ -0,0 +1,103 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# This is SGMM training with speaker vectors.
if [ -f path.sh ]; then . path.sh; fi
# To be run from ..
dir=exp/sgmm2
srcdir=exp/sgmm
gmmtridir=exp/tri1
trimodel=$gmmtridir/final.mdl
srcgraphs="ark:gunzip -c $gmmtridir/graphs.fsts.gz|"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
numiters=25 # Total number of iterations
realign_iters="5 10 15";
silphonelist=`cat data/silphones.csl`
numsubstates=1500 # Initial #-substates.
totsubstates=5000 # Target #-substates.
maxiterinc=15 # Last iter to increase #substates on.
incsubstates=$[($totsubstates-$numsubstates)/$maxiterinc] # per-iter increment for #substates
gselect_opt="--gselect=ark:gunzip -c $dir/gselect.gz|"
randprune=0.1
spkdim=39
mkdir -p $dir
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
cp $gmmtridir/tree $srcdir/{0.ali,0.mdl,gselect.gz} $dir
if [ ! -f $dir/0.ali ]; then
echo "aligning all training data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $trimodel "$srcgraphs" \
"$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
fi
if [ ! -f $dir/0.mdl ]; then
echo "you must run init_sgmm.sh before train_sgmm2.sh"
exit 1
fi
if [ ! -f $dir/gselect.gz ]; then
sgmm-gselect $dir/0.mdl "$feats" ark,t:- 2>$dir/gselect.log | gzip -c > $dir/gselect.gz || exit 1;
fi
cp $dir/0.ali $dir/cur.ali || exit 1;
iter=0
while [ $iter -lt $numiters ]; do
echo "Pass $iter ... "
if [ $iter -gt 0 ]; then
if [ $iter -le 5 ]; then # only train phonetic subspace
flags=vMwcS
elif [ $(( $iter % 2 )) -eq 1 ]; then # odd iterations
flags=vMwcS
else # even iterations, update N and not M
flags=vwcSN
fi
else
flags=vwcS
fi
if [ ! -f $dir/$[$iter+1].mdl ]; then
if echo $realign_iters | grep -w $iter >/dev/null; then
echo "Aligning data"
sgmm-align-compiled $scale_opts "$gselect_opt" --beam=8 --retry-beam=40 $dir/$iter.mdl \
"$srcgraphs" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$iter.log || exit 1;
fi
sgmm-acc-stats-ali --update-flags=$flags "$gselect_opt" --rand-prune=$randprune --binary=false $dir/$iter.mdl "$feats" ark:$dir/cur.ali $dir/$iter.acc 2> $dir/acc.$iter.log || exit 1;
if [ $iter -eq 5 ]; then # increase spk dimension from 0 to 39
sgmm-estimate --update-flags=$flags --increase-spk-dim=$spkdim --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
else
sgmm-estimate --update-flags=$flags --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
fi
fi
rm $dir/$iter.acc # $dir/$iter.mdl
# rm $dir/$iter.occs
if [ $iter -lt $maxiterinc ]; then
numsubstates=$[$numsubstates+$incsubstates]
fi
iter=$[$iter+1];
done
( cd $dir; rm final.mdl final.occs 2>/dev/null; ln -s $iter.mdl final.mdl; ln -s $iter.occs final.occs )

Просмотреть файл

@ -0,0 +1,88 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/sgmma
ubm=exp/ubma/4.ubm
srcdir=exp/tri1
srcmodel=$srcdir/final.mdl
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
numiters=25 # Total number of iterations
realign_iters="5 10 15";
silphonelist=`cat data/silphones.csl`
numsubstates=1500 # Initial #-substates.
totsubstates=5000 # Target #-substates.
maxiterinc=15 # Last iter to increase #substates on.
incsubstates=$[($totsubstates-$numsubstates)/$maxiterinc] # per-iter increment for #substates
gselect_opt="--gselect=ark:gunzip -c $dir/gselect.gz|"
randprune=0.1
mkdir -p $dir
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
cp $srcdir/tree $dir
if [ ! -f $ubm ]; then
echo "No UBM in $ubm"
fi
sgmm-init $srcdir/final.mdl $ubm $dir/0.mdl 2> $dir/sgmm_init.log
echo "aligning all training data"
if [ ! -f $dir/0.ali ]; then
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel "$srcgraphs" \
"$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
fi
if [ ! -f $dir/gselect.gz ]; then
sgmm-gselect $dir/0.mdl "$feats" ark,t:- 2>$dir/gselect.log | gzip -c > $dir/gselect.gz || exit 1;
fi
cp $dir/0.ali $dir/cur.ali || exit 1;
iter=0
while [ $iter -lt $numiters ]; do
echo "Pass $iter ... "
if echo $realign_iters | grep -w $iter >/dev/null; then
echo "Aligning data"
echo "Aligning data"
sgmm-align-compiled $spkvecs_opt $scale_opts "$gselect_opt" --beam=8 \
--retry-beam=40 $dir/$iter.mdl "$srcgraphs" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$iter.log || exit 1;
fi
if [ $iter -gt 0 ]; then
flags=vMwcS
else
flags=vwcS
fi
if [ ! -f $dir/$[$iter+1].mdl ]; then
sgmm-acc-stats-ali --update-flags=$flags "$gselect_opt" --rand-prune=$randprune --binary=false $dir/$iter.mdl "$feats" ark:$dir/cur.ali $dir/$iter.acc 2> $dir/acc.$iter.log || exit 1;
sgmm-est --update-flags=$flags --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
fi
# TEMP: will restore these statements later.
# rm $dir/$iter.mdl $dir/$iter.acc
# rm $dir/$iter.occs
if [ $iter -lt $maxiterinc ]; then
numsubstates=$[$numsubstates+$incsubstates]
fi
iter=$[$iter+1];
done
( cd $dir; rm final.mdl final.occs 2>/dev/null; ln -s $iter.mdl final.mdl; ln -s $iter.occs final.occs )

131
egs/rm/s1/steps/train_sgmmb.sh Executable file
Просмотреть файл

@ -0,0 +1,131 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
if [ -f path.sh ]; then . path.sh; fi
# To be run from ..
# You must run init_sgmma.sh first.
# We rely on the initial model exp/sgmma/0.mdl being there
dir=exp/sgmmb
srcdir=exp/tri1
srcmodel=$srcdir/final.mdl
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
numiters=25 # Total number of iterations
ubm=exp/ubma/4.ubm
realign_iters="5 10 15";
spkvec_iters="5 8 12 17 22"
silphonelist=`cat data/silphones.csl`
numsubstates=1500 # Initial #-substates.
totsubstates=5000 # Target #-substates.
maxiterinc=15 # Last iter to increase #substates on.
incsubstates=$[($totsubstates-$numsubstates)/$maxiterinc] # per-iter increment for #substates
gselect_opt="--gselect=ark:gunzip -c $dir/gselect.gz|"
# Initially don't have speaker vectors, but change this after
# we estimate them.
spkvecs_opt=
randprune=0.1
mkdir -p $dir
utt2spk_opt="--utt2spk=ark:data/train.utt2spk"
spk2utt_opt="--spk2utt=ark:data/train.spk2utt"
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
if [ ! -f $ubm ]; then
echo "No UBM in $ubm"
fi
sgmm-init --spk-space-dim=39 $srcdir/final.mdl $ubm $dir/0.mdl 2> $dir/sgmm_init.log || exit 1;
cp $srcdir/tree $dir
echo "aligning all training data"
if [ ! -f $dir/0.ali ]; then
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel "$srcgraphs" \
"$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
fi
if [ ! -f $dir/0.mdl ]; then
echo "you must run init_sgmm.sh before train_sgmm1.sh"
exit 1
fi
if [ ! -f $dir/gselect.gz ]; then
sgmm-gselect $dir/0.mdl "$feats" ark,t:- 2>$dir/gselect.log | gzip -c > $dir/gselect.gz || exit 1;
fi
cp $dir/0.ali $dir/cur.ali || exit 1;
iter=0
while [ $iter -lt $numiters ]; do
echo "Pass $iter ... "
if echo $realign_iters | grep -w $iter >/dev/null; then
echo "Aligning data"
sgmm-align-compiled $spkvecs_opt $utt2spk_opt $scale_opts "$gselect_opt" \
--beam=8 --retry-beam=40 $dir/$iter.mdl "$srcgraphs" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$iter.log || exit 1;
fi
if echo $spkvec_iters | grep -w $iter >/dev/null; then
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.01 $silphonelist $dir/$iter.mdl ark:- ark:- | \
sgmm-est-spkvecs $spk2utt_opt $spkvecs_opt "$gselect_opt" \
--rand-prune=$randprune $dir/$iter.mdl \
"$feats" ark:- ark:$dir/cur.vecs 2>$dir/spkvecs.$iter.log ) || exit 1;
spkvecs_opt="--spk-vecs=ark:$dir/cur.vecs"
fi
if [ $iter -eq 0 ]; then
flags=vwcS
elif [ $[$iter%2] -eq 1 -a $iter -gt 4 ]; then # even iters after 4...
flags=vNwcS
else
flags=vMwcS
fi
if [ ! -f $dir/$[$iter+1].mdl ]; then
sgmm-acc-stats $spkvecs_opt $utt2spk_opt --update-flags=$flags "$gselect_opt" --rand-prune=$randprune --binary=false $dir/$iter.mdl "$feats" "ark:ali-to-post ark:$dir/cur.ali ark:-|" $dir/$iter.acc 2> $dir/acc.$iter.log || exit 1;
sgmm-est --update-flags=$flags --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
fi
rm $dir/$iter.mdl $dir/$iter.acc
rm $dir/$iter.occs
if [ $iter -lt $maxiterinc ]; then
numsubstates=$[$numsubstates+$incsubstates]
fi
iter=$[$iter+1];
done
# The point of this last phase of accumulation is to get Gaussian-level
# alignments with the speaker vectors but accumulate stats without
# any speaker vectors; we re-estimate M, w, c and S to get a model
# that's compatible with not having speaker vectors.
flags=MwcS
( ali-to-post ark:$dir/cur.ali ark:- | \
sgmm-post-to-gpost $spkvecs_opt $utt2spk_opt "$gselect_opt" \
$dir/$iter.mdl "$feats" ark,s,cs:- ark:- | \
sgmm-acc-stats-gpost --update-flags=$flags $dir/$iter.mdl "$feats" \
ark,s,cs:- $dir/$iter.aliacc ) 2> $dir/acc_ali.$iter.log || exit 1;
sgmm-est --update-flags=$flags --remove-speaker-space=true $dir/$iter.mdl \
$dir/$iter.aliacc $dir/$iter.alimdl 2>$dir/update_ali.$iter.log || exit 1;
( cd $dir; rm final.mdl final.occs 2>/dev/null;
ln -s $iter.mdl final.mdl; ln -s $iter.alimdl final.alimdl;
ln -s $iter.occs final.occs )

109
egs/rm/s1/steps/train_tri1.sh Executable file
Просмотреть файл

@ -0,0 +1,109 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/tri1
srcdir=exp/mono
srcmodel=$srcdir/final.mdl
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
realign_iters="5 10 15 20";
silphonelist=`cat data/silphones.csl`
numiters=25 # Number of iterations of training
maxiterinc=15 # Last iter to increase #Gauss on.
numleaves=1500
numgauss=$[$numleaves + $numleaves/2];
# Initially mix up to avg. 1.5 Gauss/state ( a bit more
# than this, due to state clustering... then slowly mix
# up to final amount.
totgauss=7000 # Target #Gaussians
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
mkdir $dir
cp $srcdir/topo $dir
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
# Align all training data using old model. Since we have more data for this pass,
# we use the version of gmm-align that compiles the graphs itself.
echo "aligning all training data"
gmm-align $scale_opts --beam=8 --retry-beam=40 $srcdir/tree $srcmodel data/L.fst \
"$feats" ark:data/train.tra ark:$dir/0.ali 2> $dir/align.0.log || exit 1;
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
# Have to make silence root not-shared because we will not split it.
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
build-tree --verbose=1 --max-leaves=$numleaves \
$dir/treeacc $dir/roots.txt \
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
gmm-init-model --write-occs=$dir/1.occs \
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
gmm-mixup --mix-up=$numgauss $dir/1.mdl $dir/1.occs $dir/1.mdl \
2>$dir/mixup.log || exit 1;
rm $dir/treeacc
# Convert alignments generated from monophone model, to use as initial alignments.
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
# Debug step only: convert back and check they're the same.
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark:- \
2>/dev/null | cmp - $dir/0.ali || exit 1;
rm $dir/0.ali
# Make training graphs
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1;
x=1
while [ $x -lt $numiters ]; do
echo pass $x
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
fi
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
gmm-est --write-occs=$dir/$[$x+1].occs --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
rm $dir/$x.mdl $dir/$x.acc
rm $dir/$x.occs
if [[ $x -le $maxiterinc ]]; then
numgauss=$[$numgauss+$incgauss];
fi
x=$[$x+1];
done
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl; ln -s $x.occs final.occs )

101
egs/rm/s1/steps/train_tri2a.sh Executable file
Просмотреть файл

@ -0,0 +1,101 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
# This (train_tri2a) is a basic triphone training starting from tri1/,
# to serve as a baseline for the other train_tri2? scripts.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/tri2a
srcdir=exp/tri1
srcmodel=$srcdir/final.mdl
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
numiters=30 # Number of iterations of training
maxiterinc=15 # Last iter to increase #Gauss on.
numleaves=1500
numgauss=$[$numleaves*2]; # Initially mix up to avg. 2 Gauss/state.
# Then slowly mix up to final amount.
totgauss=7000 # Target #Gaussians
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
realign_iters="10 15 20"; # Because last model was reasonable, don't
# realign too soon (i.e., on 5th iter).
silphonelist=`cat data/silphones.csl`
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
mkdir -p $dir
cp $srcdir/topo $dir
echo "aligning all training data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel "$srcgraphs" \
"$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali \
$dir/treeacc 2> $dir/acc.tree.log || exit 1;
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
build-tree --verbose=1 --max-leaves=$numleaves \
$dir/treeacc $dir/roots.txt \
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
gmm-init-model --write-occs=$dir/1.occs \
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
rm $dir/treeacc
# Convert alignments generated from previous model, to use as initial alignments.
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
# Debug step only: convert back and check they're the same.
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
2>/dev/null | cmp - $dir/0.ali || exit 1;
rm $dir/0.ali
# Make training graphs
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
x=1
while [ $x -lt $numiters ]; do
echo pass $x
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
fi
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
rm $dir/$x.mdl $dir/$x.acc
if [ $x -le $maxiterinc ]; then
numgauss=$[$numgauss+$incgauss];
fi
x=$[$x+1]
done
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl )

191
egs/rm/s1/steps/train_tri2b.sh Executable file
Просмотреть файл

@ -0,0 +1,191 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
# This (train_tri2b.sh) is training the exponential transform,
# on top of standard double-delta features.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/tri2b
srcdir=exp/tri1
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
srcmodel=$srcdir/final.mdl
dim=39
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
# The spk2utt_opt uses a subset of utterances that we create; this is only
# needed by programs that use the subset.
spk2utt_opt=--spk2utt=ark:$dir/spk2utt
# the utt2spk opt is used by programs that use all the data so give
# it the original utt2spk file.
utt2spk_opt=--utt2spk=ark:data/train.utt2spk
normtype=mean # et option; could be mean, or none
numiters=30 # Number of iterations of training
maxiterinc=20 # Last iter to increase #Gauss on.
numiters_et=15 # Before this, update et.
numleaves=1500
numgauss=$numleaves
totgauss=7000 # Target #Gaussians
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
realign_iters="10 15 20 25";
silphonelist=`cat data/silphones.csl`
nutt=15 # Use at most 15 utterances from each speaker for
# estimating transforms, and A and B (will use all the data
# for estimating the model though, so be careful: we're
# not always using the lists in $dir).
mkdir -p $dir
cp $srcdir/topo $dir
awk '{ printf("%s ",$1); for(x=2; x<=NF&&x<='$nutt'+1;x++)
{ printf("%s ", $x); } printf("\n"); }' <data/train.spk2utt >$dir/spk2utt
scripts/spk2utt_to_utt2spk.pl < $dir/spk2utt > $dir/utt2spk
cat $dir/utt2spk | awk '{print $1}' > $dir/uttlist
scripts/filter_scp.pl $dir/uttlist <data/train.scp >$dir/train.scp
origfeats="ark,s,cs:add-deltas scp:data/train.scp ark:- |"
# The following two variables will get changed in the script.
feats="$origfeats"
echo "aligning all training data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel "$srcgraphs" \
"$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali \
$dir/treeacc 2> $dir/acc.tree.log || exit 1;
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
build-tree --verbose=1 --max-leaves=$numleaves \
$dir/treeacc $dir/roots.txt \
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
gmm-init-model --write-occs=$dir/1.occs \
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
# Convert alignments generated from previous model, to use as initial alignments.
rm $dir/treeacc
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali
2>$dir/convert.log || exit 1
rm $dir/0.ali
# Make training graphs
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
gmm-init-et --normalize-type=$normtype --binary=false --dim=$dim $dir/1.et 2>$dir/init_et.log || exit 1
x=1
while [ $x -lt $numiters ]; do
x1=$[$x+1];
echo pass $x
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
fi
if [ $x -lt $numiters_et ]; then
# Work out current transforms:
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
gmm-post-to-gpost $dir/$x.mdl "$feats" ark:- ark:- | \
gmm-est-et $spk2utt_opt --verbose=1 $dir/$x.mdl $dir/$x.et "$origfeats" \
ark,s,cs:- ark:$dir/$x.trans ark,t:$dir/$x.warp ) 2> $dir/trans.$x.log || exit 1;
# Remove previous transforms, if present.
if [ $x -gt 1 ]; then rm $dir/$[$x-1].trans; fi
# Now change $feats to correspond to the transformed features.
feats="ark:add-deltas scp:data/train.scp ark:- | transform-feats $utt2spk_opt ark:$dir/$x.trans ark:- ark:- |"
fi
# Accumulate stats to update model:
gmm-acc-stats-ali $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2>$dir/gmm_acc.$x.log || exit 1;
# Update model.
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$x1.mdl \
2>$dir/gmm_est.$x.log || exit 1;
rm $dir/$x.acc $dir/$x.mdl
if [ $x -lt $numiters_et ]; then
# Alternately estimate either A or B.
if [ $[$x%2] == 0 ]; then # Estimate A:
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $dir/$x1.mdl ark:- ark:- | \
gmm-post-to-gpost $dir/$x1.mdl "$feats" ark:- ark:- | \
gmm-et-acc-a $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$origfeats" ark,s,cs:- $dir/$x.et_acc_a ) 2> $dir/acc_a.$x.log || exit 1;
gmm-et-est-a --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.et_acc_a 2> $dir/update_a.$x.log || exit 1;
rm $dir/$x.et_acc_a
else
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $dir/$x1.mdl ark:- ark:- | \
gmm-post-to-gpost $dir/$x1.mdl "$feats" ark:- ark:- | \
gmm-et-acc-b $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$origfeats" ark,s,cs:- ark:$dir/$x.trans ark:$dir/$x.warp $dir/$x.et_acc_b ) 2> $dir/acc_b.$x.log || exit 1;
gmm-et-est-b --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.mat $dir/$x.et_acc_b 2> $dir/update_b.$x.log || exit 1;
rm $dir/$x.et_acc_b
# Careful!: gmm-transform-means here changes $x1.mdl in-place.
gmm-transform-means $dir/$x.mat $dir/$x1.mdl $dir/$x1.mdl 2> $dir/transform_means.$x.log
fi
fi
if [ $x -le $maxiterinc ]; then
numgauss=$[$numgauss+$incgauss];
fi
x=$[$x+1];
done
# Accumulate stats for "alignment model" which is as the model but with
# the baseline features (shares Gaussian-level alignments).
gmm-et-get-b $dir/$numiters_et.et $dir/default.mat 2>$dir/get_b.log || exit 1
defaultfeats="ark,s,cs:add-deltas scp:data/train.scp ark:- | transform-feats $dir/default.mat ark:- ark:- |"
( ali-to-post ark:$dir/cur.ali ark:- | \
gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$defaultfeats" ark:- $dir/$x.acc2 ) 2>$dir/acc_alimdl.log || exit 1;
# Update model.
gmm-est --remove-low-count-gaussians=false $dir/$x.mdl $dir/$x.acc2 $dir/$x.alimdl \
2>$dir/est_alimdl.log || exit 1;
rm $dir/$x.acc2
# The following files may be useful for display purposes.
for n in 1 2 3 4 5 6 7 8 9 10 11 12 13 14; do
cat $dir/$n.warp | scripts/process_warps.pl data/spk2gender.map > $dir/warps.$n
done
( cd $dir; rm final.mdl 2>/dev/null;
ln -s $x.mdl final.mdl; ln -s $x.alimdl final.alimdl;
ln -s $numiters_et.et final.et )

123
egs/rm/s1/steps/train_tri2c.sh Executable file
Просмотреть файл

@ -0,0 +1,123 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
# This (train_tri2c) is training with mean normalization (you could
# modify options in this script to do variance normalization).
if [ -f path.sh ]; then . path.sh; fi
dir=exp/tri2c
srcdir=exp/tri1
srcmodel=$srcdir/final.mdl
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
mkdir -p $dir
cp $srcdir/topo $dir
norm_vars=false
after_deltas=false
per_spk=true
numiters=30 # Number of iterations of training
maxiterinc=20 # Last iter to increase #Gauss on.
numiters_et=15 # Before this, update et.
numleaves=1500
numgauss=$numleaves
totgauss=7000 # Target #Gaussians
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
realign_iters="10 15 20 25";
silphonelist=`cat data/silphones.csl`
if [ $per_spk == "true" ]; then
spk2utt_opt=--spk2utt=ark:data/train.spk2utt
utt2spk_opt=--utt2spk=ark:data/train.utt2spk
else
spk2utt_opt=
utt2spk_opt=
fi
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
echo "Computing cepstral mean and variance stats."
# compute mean and variance stats.
if [ $after_deltas == true ]; then
compute-cmvn-stats $spk2utt_opt "$srcfeats" ark:$dir/cmvn.ark 2>$dir/cmvn.log
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- | apply-cmvn --norm-vars=$norm_vars $utt2spk_opt ark:$dir/cmvn.ark ark:- ark:- |"
else
compute-cmvn-stats --spk2utt=ark:data/train.spk2utt scp:data/train.scp \
ark:$dir/cmvn.ark 2>$dir/cmvn.log
feats="ark:apply-cmvn --norm-vars=$norm_vars $utt2spk_opt ark:$dir/cmvn.ark scp:data/train.scp ark:- | add-deltas --print-args=false ark:- ark:- |"
fi
echo "aligning all training data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
build-tree --verbose=1 --max-leaves=$numleaves \
$dir/treeacc $dir/roots.txt \
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
gmm-init-model --write-occs=$dir/1.occs \
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
# Convert alignments generated from previous model, to use as initial alignments.
rm $dir/treeacc
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
# Debug step only: convert back and check they're the same.
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
2>/dev/null | cmp - $dir/0.ali || exit 1;
rm $dir/0.ali
# Make training graphs
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra ark:$dir/graphs.fsts \
2>$dir/compile_graphs.log || exit 1
x=1
while [ $x -lt $numiters ]; do
echo pass $x
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl ark:$dir/graphs.fsts "$feats" \
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
fi
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
rm $dir/$x.mdl $dir/$x.acc
if [ $x -le $maxiterinc ]; then
numgauss=$[$numgauss+$incgauss];
fi
x=$[$x+1];
done
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl )

120
egs/rm/s1/steps/train_tri2d.sh Executable file
Просмотреть файл

@ -0,0 +1,120 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
# This (train_tri2d) is training with standard delta+delta-delta features
# plus MLLT.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/tri2d
srcdir=exp/tri1
srcmodel=$srcdir/final.mdl
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
numiters=30 # Number of iterations of training
maxiterinc=20 # Last iter to increase #Gauss on.
numleaves=1500
numgauss=$numleaves
totgauss=7000 # Target #Gaussians
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
silphonelist=`cat data/silphones.csl`
realign_iters="10 15 20 25";
mllt_iters="2 4 6 12";
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
# Subset of features used to train MLLT transform.
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --print-args=false scp:- ark:- |"
mkdir -p $dir
cp $srcdir/topo $dir
echo "aligning all training data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
"$srcgraphs" "$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
build-tree --verbose=1 --max-leaves=$numleaves \
$dir/treeacc $dir/roots.txt \
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
gmm-init-model --write-occs=$dir/1.occs \
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
rm $dir/treeacc
# Convert alignments generated from monophone model, to use as initial alignments.
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
# Debug step only: convert back and check they're the same.
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
2>/dev/null | cmp - $dir/0.ali || exit 1;
rm $dir/0.ali
# Make training graphs
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
cur_mllt=""
x=1
while [ $x -lt $numiters ]; do
echo pass $x
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
fi
if echo $mllt_iters | grep -w $x >/dev/null; then # Do MLLT update.
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
gmm-acc-mllt --binary=false $dir/$x.mdl "$featsub" ark:- $dir/$x.macc ) 2> $dir/macc.$x.log || exit 1;
if [ "$cur_mllt" != "" ]; then
est-mllt $dir/$x.mat.new $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
gmm-transform-means --binary=false $dir/$x.mat.new $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
compose-transforms --print-args=false $dir/$x.mat.new $cur_mllt $dir/$x.mat || exit 1;
else
est-mllt $dir/$x.mat $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
gmm-transform-means --binary=false $dir/$x.mat $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
fi
cur_mllt=$dir/$x.mat
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- | transform-feats $cur_mllt ark:- ark:- |"
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --print-args=false scp:- ark:- | transform-feats $cur_mllt ark:- ark:- |"
else # do GMM update.
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
fi
rm $dir/$x.mdl $dir/$x.acc
if [ $x -le $maxiterinc ]; then
numgauss=$[$numgauss+$incgauss];
fi
x=$[$x+1]
done
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
ln -s `basename $cur_mllt` final.mat )

111
egs/rm/s1/steps/train_tri2e.sh Executable file
Просмотреть файл

@ -0,0 +1,111 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
# This (train_tri2e) is training with splice-9-frames+LDA features.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/tri2e
srcdir=exp/tri1
srcmodel=$srcdir/final.mdl
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
numiters=30 # Number of iterations of training
maxiterinc=20 # Last iter to increase #Gauss on.
numleaves=1500
numgauss=$numleaves
totgauss=7000 # Target #Gaussians
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
silphonelist=`cat data/silphones.csl`
realign_iters="10 15 20 25";
# feats corresponding to orignal model
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/lda.mat ark:- ark:-|"
# Subset of features used to train LDA transforms.
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $dir/lda.mat ark:- ark:-|"
mkdir -p $dir
cp $srcdir/topo $dir
echo "aligning all training data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
(ali-to-post ark:$dir/0.ali ark:- | \
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
ark:- $dir/lda.acc ) 2>$dir/lda_acc.log || exit 1
est-lda $dir/lda.mat $dir/lda.acc 2>$dir/lda_est.log || exit 1
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
build-tree --verbose=1 --max-leaves=$numleaves \
$dir/treeacc $dir/roots.txt \
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
gmm-init-model --write-occs=$dir/1.occs \
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
rm $dir/treeacc
# Convert alignments generated from monophone model, to use as initial alignments.
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
# Debug step only: convert back and check they're the same.
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
2>/dev/null | cmp - $dir/0.ali || exit 1;
rm $dir/0.ali
# Make training graphs
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
x=1
while [ $x -lt $numiters ]; do
echo pass $x
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
fi
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
rm $dir/$x.mdl $dir/$x.acc
if [ $x -le $maxiterinc ]; then
numgauss=$[$numgauss+$incgauss];
fi
x=$[$x+1]
done
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
ln -s lda.mat final.mat )

130
egs/rm/s1/steps/train_tri2f.sh Executable file
Просмотреть файл

@ -0,0 +1,130 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
# This (train_tri2f) is training with splice-9-frames+LDA features,
# plus MLLT.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/tri2f
srcdir=exp/tri1
srcmodel=$srcdir/final.mdl
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
numiters=30 # Number of iterations of training
maxiterinc=20 # Last iter to increase #Gauss on.
numleaves=1500
numgauss=$numleaves
totgauss=7000 # Target #Gaussians
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
silphonelist=`cat data/silphones.csl`
realign_iters="10 15 20 25";
mllt_iters="2 4 6 12";
# feats corresponding to orignal model
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
# Subset of features used to train LDA and MLLT transforms.
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $dir/0.mat ark:- ark:-|"
mkdir -p $dir
cp $srcdir/topo $dir
echo "aligning all training data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
( ali-to-post ark:$dir/0.ali ark:- | \
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
ark:- $dir/lda.acc ) 2>$dir/lda_acc.log
est-lda $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
build-tree --verbose=1 --max-leaves=$numleaves \
$dir/treeacc $dir/roots.txt \
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
gmm-init-model --write-occs=$dir/1.occs \
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
rm $dir/treeacc
# Convert alignments generated from monophone model, to use as initial alignments.
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
# Debug step only: convert back and check they're the same.
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
2>/dev/null | cmp - $dir/0.ali || exit 1;
rm $dir/0.ali
# Make training graphs
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
cur_lda=$dir/0.mat
x=1
while [ $x -lt $numiters ]; do
echo pass $x
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
fi
if echo $mllt_iters | grep -w $x >/dev/null; then # Do MLLT update.
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
gmm-acc-mllt --binary=false $dir/$x.mdl "$featsub" ark:- $dir/$x.macc ) 2> $dir/macc.$x.log || exit 1;
est-mllt $dir/$x.mat.new $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
gmm-transform-means --binary=false $dir/$x.mat.new $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
compose-transforms --print-args=false $dir/$x.mat.new $cur_lda $dir/$x.mat || exit 1;
cur_lda=$dir/$x.mat
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
# Subset of features used to train MLLT transforms.
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $cur_lda ark:- ark:-|"
else # do GMM update.
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
fi
rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
if [ $x -le $maxiterinc ]; then
numgauss=$[$numgauss+$incgauss];
fi
x=$[$x+1]
done
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
ln -s `basename $cur_lda` final.mat )

205
egs/rm/s1/steps/train_tri2g.sh Executable file
Просмотреть файл

@ -0,0 +1,205 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
# This (train_tri2g) is training with linear-VTLN (lvtln)
# which a linear approximation to VTLN.
# At the end, it also converts this in a single-pass retraining
# manner to a normal feature-level VTLN model.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/tri2g
srcdir=exp/tri1
srcmodel=$srcdir/final.mdl
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
numiters=30 # Number of iterations of training
maxiterinc=20 # Last iter to increase #Gauss on.
numleaves=1500
numgauss=$numleaves
totgauss=7000 # Target #Gaussians
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
silphonelist=`cat data/silphones.csl`
realign_iters="10 15 20 25";
lvtln_iters="2 4 6 8 12"; # Recompute LVTLN transforms on these iters.
per_spk=true
compute_vtlnmdl=true # If true, at the end compute a model with actual feature-space
# VTLN features. You can decode with this as an alternative to
# final.mdl which takes the LVTLN features.
numfiles=40 # Number of feature files for computing LVTLN transforms.
numclass=31; # Can't really change this without changing the script below
defaultclass=15; # Corresponds to no warping.
# RE "vtln_warp"
if [ $per_spk == "true" ]; then
spk2utt_opt=--spk2utt=ark:data/train.spk2utt
utt2spk_opt=--utt2spk=ark:data/train.utt2spk
else
spk2utt_opt=
utt2spk_opt=
fi
mkdir -p $dir
cp $srcdir/topo $dir
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
# Will create lvtln.trans below...
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- | transform-feats $utt2spk_opt ark:$dir/cur.trans ark:- ark:- |"
gmm-init-lvtln --dim=39 --num-classes=$numclass --default-class=$defaultclass \
$dir/0.lvtln 2>$dir/init_lvtln.log || exit 1
featsub="ark:scripts/subset_scp.pl $numfiles data/train.scp | add-deltas scp:- ark:- |"
echo "Initializing lvtln transforms."
c=0
while [ $c -lt $numclass ]; do
warp=`perl -e 'print 0.85 + 0.01*$ARGV[0];' $c`
featsub_warp="ark:scripts/subset_scp.pl $numfiles data_prep/train_wav.scp | compute-mfcc-feats --vtln-low=100 --vtln-high=-600 --vtln-warp=$warp --config=conf/mfcc.conf scp:- ark:- | add-deltas ark:- ark:- |"
gmm-train-lvtln-special --normalize-var=true $c $dir/0.lvtln $dir/0.lvtln \
"$featsub" "$featsub_warp" 2> $dir/train_special.$c.log || exit 1;
c=$[$c+1]
done
# just a single element. :-separated integer list of context-independent
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
# script below tells it not to cluster, but here we avoid accumulating
# CD-stats for silence.
echo "aligning all training data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
echo "Computing LVTLN transforms (iter 0)"
( ali-to-post ark:$dir/0.ali ark:- | \
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
gmm-post-to-gpost $srcmodel "$srcfeats" ark:- ark:- | \
gmm-est-lvtln-trans --verbose=1 $spk2utt_opt $srcmodel $dir/0.lvtln \
"$srcfeats" ark:- ark:$dir/cur.trans ark,t:$dir/0.warp ) 2>$dir/lvtln.0.log || exit 1
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
build-tree --verbose=1 --max-leaves=$numleaves \
$dir/treeacc $dir/roots.txt \
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
gmm-init-model --write-occs=$dir/1.occs \
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
rm $dir/treeacc
# Convert alignments generated from monophone model, to use as initial alignments.
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
# Debug step only: convert back and check they're the same.
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
2>/dev/null | cmp - $dir/0.ali || exit 1;
rm $dir/0.ali
# Make training graphs
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
"ark:|gzip -c > $dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
cur_lvtln=$dir/0.lvtln
x=1
while [ $x -lt $numiters ]; do
echo pass $x
if echo $lvtln_iters | grep -w $x >/dev/null; then
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
gmm-post-to-gpost $dir/$x.mdl "$feats" ark:- ark:- | \
gmm-est-lvtln-trans --verbose=1 $spk2utt_opt $dir/$x.mdl $dir/0.lvtln \
"$srcfeats" ark:- ark:$dir/tmp.trans ark,t:$dir/$x.warp ) 2>$dir/lvtln.$x.log || exit 1
cp $dir/$x.warp $dir/cur.warp
mv $dir/tmp.trans $dir/cur.trans
fi
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
fi
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
gmm-est --write-occs=$dir/$[$x+1].occs --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
rm $dir/$x.mdl $dir/$x.acc
if [ $x -le $maxiterinc ]; then
numgauss=$[$numgauss+$incgauss];
fi
x=$[$x+1]
done
# Accumulate stats for "alignment model" which is as the model but with
# the baseline features (shares Gaussian-level alignments).
( ali-to-post ark:$dir/cur.ali ark:- | \
gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$srcfeats" ark:- $dir/$x.acc2 ) 2>$dir/acc_alimdl.log || exit 1;
# Update model.
gmm-est --remove-low-count-gaussians=false $dir/$x.mdl $dir/$x.acc2 $dir/$x.alimdl \
2>$dir/est_alimdl.log || exit 1;
rm $dir/$x.acc2
# The following files contains information that may be useful for display purposes
for n in 0 $lvtln_iters; do
cat $dir/$n.warp | scripts/process_warps.pl data/spk2gender.map > $dir/warps.$n
done
if [ $compute_vtlnmdl == "true" ]; then
cat $dir/cur.warp | awk '{print $1, (0.85+0.01*$2);}' > $dir/cur.factor
compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/cur.factor --config=conf/mfcc.conf scp:data_prep/train_wav.scp ark:$dir/tmp.ark 2>$dir/mfcc.log
vtlnfeats="ark:add-deltas ark:$dir/tmp.ark ark:- |"
# Compute diagonal fMLLR transform to normalize VTLN feats.
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
gmm-est-fmllr --fmllr-update-type=diag $spk2utt_opt $dir/$x.mdl "$vtlnfeats" ark,o:- ark:$dir/vtln.trans ) 2>$dir/vtln_fmllr.log || exit 1;
vtlnfeats="ark:add-deltas ark:$dir/tmp.ark ark:- | transform-feats $utt2spk_opt ark:$dir/vtln.trans ark:- ark:- |"
( ali-to-post ark:$dir/cur.ali ark:- | \
gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$vtlnfeats" ark:- $dir/$x.acc3 ) 2>$dir/acc_vtlnmdl.log || exit 1;
# Update model.
gmm-est $dir/$x.mdl $dir/$x.acc3 $dir/$x.vtlnmdl \
2>$dir/est_vtlnmdl.log || exit 1;
rm $dir/$x.acc3
ln -s $x.vtlnmdl $dir/final.vtlnmdl
rm $dir/tmp.ark
fi
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
ln -s $x.alimdl final.alimdl;
ln -s 0.lvtln final.lvtln;
ln -s cur.trans final.trans )

123
egs/rm/s1/steps/train_tri2h.sh Executable file
Просмотреть файл

@ -0,0 +1,123 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
# This (train_tri2h) is training with splice-9-frames+HLDA features.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/tri2h
srcdir=exp/tri1
srcmodel=$srcdir/final.mdl
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
numiters=30 # Number of iterations of training
maxiterinc=20 # Last iter to increase #Gauss on.
numleaves=1500
numgauss=$numleaves
totgauss=7000 # Target #Gaussians
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
realign_iters="10 15 20 25";
hlda_iters="2 4 6 12";
silphonelist=`cat data/silphones.csl`
mkdir -p $dir
cp $srcdir/topo $dir
# feats corresponding to orignal model
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
rawfeats="ark:splice-feats scp:data/train.scp ark:- |"
# The "speedup" parameter controls how much of the data to use
# in the most intensive part of the HLDA transform computation.
speedup=0.1
echo "aligning all training data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
ali-to-post ark:$dir/0.ali ark:- | \
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
ark:- $dir/lda.acc 2>$dir/lda_acc.log
est-lda --write-full-matrix=$dir/0.fullmat $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
build-tree --verbose=1 --max-leaves=$numleaves \
$dir/treeacc $dir/roots.txt \
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
gmm-init-model --write-occs=$dir/1.occs \
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
# Convert alignments generated from monophone model, to use as initial alignments.
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
# Debug step only: convert back and check they're the same.
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
2>/dev/null | cmp - $dir/0.ali || exit 1;
# Make training graphs
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
"ark:| gzip -c > $dir/graphs.fsts.gz" \
2>$dir/compile_graphs.log || exit 1
cur_mat_iter=0
x=1
while [ $x -lt $numiters ]; do
echo pass $x
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
fi
if echo $hlda_iters | grep -w $x >/dev/null; then # Do HLDA update.
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.01 $silphonelist $dir/$x.mdl ark:- ark:- | \
gmm-acc-hlda --speedup=$speedup --binary=false $dir/$x.mdl $dir/$cur_mat_iter.mat "$rawfeats" ark:- $dir/$x.hacc ) 2> $dir/hacc.$x.log || exit 1;
gmm-est-hlda $dir/$x.mdl $dir/$cur_mat_iter.fullmat $dir/$[$x+1].mdl $dir/$x.fullmat $dir/$x.mat $dir/$x.hacc 2> $dir/hupdate.$x.log || exit 1;
cur_mat_iter=$x
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/$cur_mat_iter.mat ark:- ark:-|"
else # do GMM update.
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
fi
rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
if [ $x -le $maxiterinc ]; then
numgauss=$[$numgauss+$incgauss];
fi
x=$[$x+1]
done
( cd $dir;
rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
rm final.mat 2>/dev/null; ln -s $cur_mat_iter.mat final.mat )

127
egs/rm/s1/steps/train_tri2i.sh Executable file
Просмотреть файл

@ -0,0 +1,127 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
# This (train_tri2i) is training with triple-deltas+HLDA features.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/tri2i
srcdir=exp/tri1
srcmodel=$srcdir/final.mdl
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
numiters=30 # Number of iterations of training
maxiterinc=20 # Last iter to increase #Gauss on.
numleaves=1500
numgauss=$numleaves
totgauss=7000 # Target #Gaussians
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
realign_iters="10 15 20 25";
hlda_iters="2 4 6 12";
silphonelist=`cat data/silphones.csl`
mkdir -p $dir
cp $srcdir/topo $dir
# feats corresponding to orignal model
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
feats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
rawfeats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- |"
# The "speedup" parameter controls how much of the data to use
# in the most intensive part of the HLDA transform computation.
speedup=0.1
echo "aligning all training data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
( ali-to-post ark:$dir/0.ali ark:- | \
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | \
add-deltas --delta-order=3 scp:- ark:- |" \
ark:- $dir/lda.acc ) 2>$dir/lda_acc.log
est-lda --write-full-matrix=$dir/0.fullmat $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
build-tree --verbose=1 --max-leaves=$numleaves \
$dir/treeacc $dir/roots.txt \
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
gmm-init-model --write-occs=$dir/1.occs \
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
# Convert alignments generated from monophone model, to use as initial alignments.
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
# Debug step only: convert back and check they're the same.
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
2>/dev/null | cmp - $dir/0.ali || exit 1;
# Make training graphs
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
"ark:| gzip -c > $dir/graphs.fsts.gz" \
2>$dir/compile_graphs.log || exit 1
cur_mat_iter=0
x=1
while [ $x -lt $numiters ]; do
echo pass $x
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
fi
if echo $hlda_iters | grep -w $x >/dev/null; then # Do HLDA update.
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.01 $silphonelist $dir/$x.mdl ark:- ark:- | \
gmm-acc-hlda --speedup=$speedup --binary=false $dir/$x.mdl $dir/$cur_mat_iter.mat "$rawfeats" ark:- $dir/$x.hacc ) 2> $dir/hacc.$x.log || exit 1;
gmm-est-hlda $dir/$x.mdl $dir/$cur_mat_iter.fullmat $dir/$[$x+1].mdl $dir/$x.fullmat $dir/$x.mat $dir/$x.hacc 2> $dir/hupdate.$x.log || exit 1;
cur_mat_iter=$x
feats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- | transform-feats $dir/$cur_mat_iter.mat ark:- ark:-|"
else # do GMM update.
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
fi
rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
if [ $x -le $maxiterinc ]; then
numgauss=$[$numgauss+$incgauss];
fi
x=$[$x+1]
done
( cd $dir;
rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
rm final.mat 2>/dev/null; ln -s $cur_mat_iter.mat final.mat )

231
egs/rm/s1/steps/train_tri2j.sh Executable file
Просмотреть файл

@ -0,0 +1,231 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
# This (train_tri2j) is training with triple-deltas+LDA+MLLT.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/tri2j
srcdir=exp/tri1
srcmodel=$srcdir/final.mdl
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
numiters=30 # Number of iterations of training
maxiterinc=20 # Last iter to increase #Gauss on.
numleaves=1500
numgauss=$numleaves
totgauss=7000 # Target #Gaussians
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
realign_iters="10 15 20 25";
mllt_iters="2 4 6 12";
silphonelist=`cat data/silphones.csl`
mkdir -p $dir
cp $srcdir/topo $dir
# feats corresponding to orignal model
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
feats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
# Subset of features used to train LDA and MLLT transforms.
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --delta-order=3 scp:- ark:- | transform-feats $dir/0.mat ark:- ark:-|"
echo "aligning all training data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
( ali-to-post ark:$dir/0.ali ark:- | \
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --delta-order=3 scp:- ark:- |" \
ark:- $dir/lda.acc ) 2>$dir/lda_acc.log
est-lda $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
build-tree --verbose=1 --max-leaves=$numleaves \
$dir/treeacc $dir/roots.txt \
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
gmm-init-model --write-occs=$dir/1.occs \
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
# Convert alignments generated from monophone model, to use as initial alignments.
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
# Debug step only: convert back and check they're the same.
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
2>/dev/null | cmp - $dir/0.ali || exit 1;
# Make training graphs
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
"ark:| gzip -c > $dir/graphs.fsts.gz" \
2>$dir/compile_graphs.log || exit 1
cur_lda=$dir/0.mat
x=1
while [ $x -lt $numiters ]; do
echo pass $x
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
fi
if echo $mllt_iters | grep -w $x >/dev/null; then # Do MLLT update.
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
gmm-acc-mllt --binary=false $dir/$x.mdl "$featsub" ark:- $dir/$x.macc ) 2> $dir/macc.$x.log || exit 1;
est-mllt $dir/$x.mllt.new $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
gmm-transform-means --binary=false $dir/$x.mllt.new $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
compose-transforms $dir/$x.mllt.new $cur_lda $dir/$x.mat || exit 1;
cur_lda=$dir/$x.mat
feats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
# Subset of features used to train MLLT transforms.
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --delta-order=3 scp:- ark:- | transform-feats $cur_lda ark:- ark:-|"
else # do GMM update.
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
fi
rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
if [[ $x -lt 21 ]]; then
numgauss=$[$numgauss+$incgauss];
fi
if [ $x -le $maxiterinc ]; then
numgauss=$[$numgauss+$incgauss];
fi
x=$[$x+1]
done
( cd $dir;
rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
rm final.mat 2>/dev/null; ln -s `basename $cur_lda` final.mat )
=======
#!/bin/bash
# To be run from ..
# This (train_tri2j) is training with splice-9-frames+HLDA features.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/tri2j
srcdir=exp/tri
srcmodel=$srcdir/30.mdl
srcgraphs=ark:$srcdir/graphs.fsts
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
mkdir -p $dir
cp $srcdir/topo $dir
numgauss=1500
incgauss=275 # Inc by 275 per iter for 20 iters; 1500 + 275*20 = 7000 which is
# similar to the HTK baseline.
silphonelist=`cat data/silphones.csl`
realign_iters="10 15 20 25";
hlda_iters="2 4 6 12";
# feats corresponding to orignal model
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
rawfeats="ark:splice-feats scp:data/train.scp ark:- |"
# The "speedup" parameter controls how much of the data to use
# in the most intensive part of the HLDA transform computation.
speedup=0.1
if false; then
echo "aligning all training data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel $srcgraphs "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
ali-to-post ark:$dir/0.ali ark:- | \
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
acc-lda $srcmodel "ark:head -800 data/train.scp | splice-feats scp:- ark:- |" \
ark:- $dir/lda.acc 2>$dir/lda_acc.log
est-lda --write-full-matrix=$dir/0.fullmat $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
build-tree --verbose=1 --max-leaves=$numleaves \
$dir/treeacc $dir/roots.txt \
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
gmm-init-model --write-occs=$dir/1.occs \
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
# Convert alignments generated from monophone model, to use as initial alignments.
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
# Debug step only: convert back and check they're the same.
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
2>/dev/null | cmp - $dir/0.ali || exit 1;
# Make training graphs
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra ark:$dir/graphs.fsts \
2>$dir/compile_graphs.log || exit 1
cur_mat_iter=0
for x in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29; do
echo pass $x
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl ark:$dir/graphs.fsts "$feats" \
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
fi
if echo $hlda_iters | grep -w $x >/dev/null; then # Do HLDA update.
ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.01 $silphonelist $dir/$x.mdl ark:- ark:- | \
gmm-acc-hlda --speedup=$speedup --binary=false $dir/$x.mdl $dir/$cur_mat_iter.mat "$rawfeats" ark:- $dir/$x.hacc 2> $dir/hacc.$x.log || exit 1;
gmm-est-hlda $dir/$x.mdl $dir/$cur_mat_iter.fullmat $dir/$[$x+1].mdl $dir/$x.fullmat $dir/$x.mat $dir/$x.hacc 2> $dir/hupdate.$x.log || exit 1;
cur_mat_iter=$x
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/$cur_mat_iter.mat ark:- ark:-|"
else # do GMM update.
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
fi
rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
if [[ $x -lt 21 ]]; then
numgauss=$[$numgauss+$incgauss];
fi
done

209
egs/rm/s1/steps/train_tri2k.sh Executable file
Просмотреть файл

@ -0,0 +1,209 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
# This (train_tri2k.sh) is training the exponential transform
# after LDA (so the same as LDA+MLLT+ET, since ET includes
# MLLT).
if [ -f path.sh ]; then . path.sh; fi
dir=exp/tri2k
srcdir=exp/tri1
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
srcmodel=$srcdir/final.mdl
dim=40
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
# The spk2utt_opt uses a subset of utterances that we create; this is only
# needed by programs that use the subset.
spk2utt_opt=--spk2utt=ark:$dir/spk2utt
# the utt2spk opt is used by programs that use all the data so give
# it the original utt2spk file.
utt2spk_opt=--utt2spk=ark:data/train.utt2spk
normtype=mean # et option; could be mean, or none
numiters=30 # Number of iterations of training
maxiterinc=20 # Last iter to increase #Gauss on.
numiters_et=15 # Before this, update et.
numleaves=1500
numgauss=$numleaves
totgauss=7000 # Target #Gaussians
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
realign_iters="10 15 20 25";
silphonelist=`cat data/silphones.csl`
nutt=15 # Use at most 15 utterances from each speaker for
# estimating transforms, and A and B (will use all the data
# for estimating the model though, so be careful: we're
# not always using the lists in $dir).
mkdir -p $dir
cp $srcdir/topo $dir
awk '{ printf("%s ",$1); for(x=2; x<=NF&&x<='$nutt'+1;x++)
{ printf("%s ", $x); } printf("\n"); }' <data/train.spk2utt >$dir/spk2utt
scripts/spk2utt_to_utt2spk.pl < $dir/spk2utt > $dir/utt2spk
cat $dir/utt2spk | awk '{print $1}' > $dir/uttlist
scripts/filter_scp.pl $dir/uttlist <data/train.scp >$dir/train.scp
srcfeats="ark,s,cs:add-deltas scp:data/train.scp ark:- |"
# For now, there is no subsetting.
basefeats="ark,s,cs:splice-feats scp:data/train.scp ark:- | transform-feats $dir/lda.mat ark:- ark:- |"
## The following two variables will get changed in the script.
feats="$basefeats"
echo "aligning all training data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel "$srcgraphs" \
"$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
echo "computing LDA transform"
( ali-to-post ark:$dir/0.ali ark:- | \
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
ark:- $dir/lda.acc ) 2>$dir/lda_acc.log || exit 1
est-lda --dim=$dim $dir/lda.mat $dir/lda.acc 2>$dir/lda_est.log || exit 1
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali \
$dir/treeacc 2> $dir/acc.tree.log || exit 1;
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
build-tree --verbose=1 --max-leaves=$numleaves \
$dir/treeacc $dir/roots.txt \
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
gmm-init-model --write-occs=$dir/1.occs \
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
# Convert alignments generated from previous model, to use as initial alignments.
rm $dir/treeacc
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali
2>$dir/convert.log || exit 1
rm $dir/0.ali
# Make training graphs
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
gmm-init-et --normalize-type=$normtype --binary=false --dim=$dim $dir/1.et 2>$dir/init_et.log || exit 1
x=1
while [ $x -lt $numiters ]; do
x1=$[$x+1];
echo pass $x
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
fi
if [ $x -lt $numiters_et ]; then
# Work out current transforms:
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
gmm-post-to-gpost $dir/$x.mdl "$feats" ark:- ark:- | \
gmm-est-et $spk2utt_opt --verbose=1 $dir/$x.mdl $dir/$x.et "$basefeats" \
ark,s,cs:- ark:$dir/$x.trans ark,t:$dir/$x.warp ) 2> $dir/trans.$x.log || exit 1;
# Remove previous transforms, if present.
if [ $x -gt 1 ]; then rm $dir/$[$x-1].trans; fi
# Now change $feats to correspond to the transformed features. We compose the
# transforms themselves (it's more efficient than transforming the features
# twice).
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/lda.mat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/$x.trans ark:- ark:- |"
fi
# Accumulate stats to update model:
gmm-acc-stats-ali $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2>$dir/gmm_acc.$x.log || exit 1;
# Update model.
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$x1.mdl \
2>$dir/gmm_est.$x.log || exit 1;
rm $dir/$x.acc $dir/$x.mdl
if [ $x -lt $numiters_et ]; then
# Alternately estimate either A or B.
if [ $[$x%2] == 0 ]; then # Estimate A:
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $dir/$x1.mdl ark:- ark:- | \
gmm-post-to-gpost $dir/$x1.mdl "$feats" ark:- ark:- | \
gmm-et-acc-a $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$basefeats" ark,s,cs:- $dir/$x.et_acc_a ) 2> $dir/acc_a.$x.log || exit 1;
gmm-et-est-a --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.et_acc_a 2> $dir/update_a.$x.log || exit 1;
rm $dir/$x.et_acc_a
else
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $dir/$x1.mdl ark:- ark:- | \
gmm-post-to-gpost $dir/$x1.mdl "$feats" ark:- ark:- | \
gmm-et-acc-b $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$basefeats" ark,s,cs:- ark:$dir/$x.trans ark:$dir/$x.warp $dir/$x.et_acc_b ) 2> $dir/acc_b.$x.log || exit 1;
gmm-et-est-b --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.mat $dir/$x.et_acc_b 2> $dir/update_b.$x.log || exit 1;
rm $dir/$x.et_acc_b
# Careful!: gmm-transform-means here changes $x1.mdl in-place.
gmm-transform-means $dir/$x.mat $dir/$x1.mdl $dir/$x1.mdl 2> $dir/transform_means.$x.log
fi
fi
if [ $x -le $maxiterinc ]; then
numgauss=$[$numgauss+$incgauss];
fi
x=$[$x+1];
done
gmm-et-get-b $dir/$numiters_et.et $dir/B.mat 2>$dir/get_b.log || exit 1
compose-transforms $dir/B.mat $dir/lda.mat $dir/default.mat 2>>$dir/get_b.log || exit 1
defaultfeats="ark,s,cs:splice-feats scp:data/train.scp ark:- | transform-feats $dir/default.mat ark:- ark:- |"
# Accumulate stats for "alignment model" which is as the model but with
# the default features (shares Gaussian-level alignments).
( ali-to-post ark:$dir/cur.ali ark:- | \
gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$defaultfeats" ark:- $dir/$x.acc2 ) 2>$dir/acc_alimdl.log || exit 1;
# Update model.
gmm-est --write-occs=$dir/final.occs --remove-low-count-gaussians=false $dir/$x.mdl $dir/$x.acc2 $dir/$x.alimdl \
2>$dir/est_alimdl.log || exit 1;
rm $dir/$x.acc2
# The following files may be useful for display purposes.
for n in 1 2 3 4 5 6 7 8 9 10 11 12 13 14; do
cat $dir/$n.warp | scripts/process_warps.pl data/spk2gender.map > $dir/warps.$n
done
( cd $dir; rm final.mdl 2>/dev/null;
ln -s $x.mdl final.mdl; ln -s $x.alimdl final.alimdl;
ln -s $numiters_et.et final.et
ln -s $[$numiters_et-1].trans final.trans )

156
egs/rm/s1/steps/train_tri2l.sh Executable file
Просмотреть файл

@ -0,0 +1,156 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# To be run from ..
# This (train_tri2l) is training with splice-9-frames+LDA features,
# plus MLLT plus CMLLR/fMLLR (i.e. speaker adapted training).
if [ -f path.sh ]; then . path.sh; fi
dir=exp/tri2l
srcdir=exp/tri1
srcmodel=$srcdir/final.mdl
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
numiters=30 # Number of iterations of training
maxiterinc=20 # Last iter to increase #Gauss on.
numleaves=1500
numgauss=$numleaves
totgauss=7000 # Target #Gaussians
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
silphonelist=`cat data/silphones.csl`
realign_iters="10 15 20 25";
mllt_iters="2 4 6 8";
fmllr_iters="9 14 19"
spk2utt_opt="--spk2utt=ark:data/train.spk2utt"
utt2spk_opt="--utt2spk=ark:data/train.utt2spk"
# feats corresponding to original model
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
# Subset of features used to train LDA and MLLT transforms.
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $dir/0.mat ark:- ark:-|"
mkdir -p $dir
cp $srcdir/topo $dir
echo "aligning all training data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
( ali-to-post ark:$dir/0.ali ark:- | \
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
ark:- $dir/lda.acc ) 2>$dir/lda_acc.log
est-lda $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
build-tree --verbose=1 --max-leaves=$numleaves \
$dir/treeacc $dir/roots.txt \
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
gmm-init-model --write-occs=$dir/1.occs \
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
rm $dir/treeacc
# Convert alignments generated from monophone model, to use as initial alignments.
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
# Debug step only: convert back and check they're the same.
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
2>/dev/null | cmp - $dir/0.ali || exit 1;
rm $dir/0.ali
# Make training graphs
echo "Compiling training graphs"
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
cur_lda=$dir/0.mat
x=1
while [ $x -lt $numiters ]; do
echo pass $x
if echo $realign_iters | grep -w $x >/dev/null; then
echo "Aligning data"
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
fi
if echo $fmllr_iters | grep -w $x >/dev/null; then # Compute CMLLR transforms.
sifeats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
gmm-post-to-gpost $dir/$x.mdl "$feats" ark:- ark:- | \
gmm-est-fmllr-gpost $spk2utt_opt $dir/$x.mdl "$sifeats" ark,s,cs:- ark:$dir/tmp.trans ) \
2> $dir/trans.$x.log || exit 1;
mv $dir/tmp.trans $dir/cur.trans
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/cur.trans ark:- ark:- |"
fi
if echo $mllt_iters | grep -w $x >/dev/null; then # Do MLLT update.
( ali-to-post ark:$dir/cur.ali ark:- | \
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
gmm-acc-mllt --binary=false $dir/$x.mdl "$featsub" ark:- $dir/$x.macc ) 2> $dir/macc.$x.log || exit 1;
est-mllt $dir/$x.mat.new $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
gmm-transform-means --binary=false $dir/$x.mat.new $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
compose-transforms --print-args=false $dir/$x.mat.new $cur_lda $dir/$x.mat || exit 1;
cur_lda=$dir/$x.mat
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
# Subset of features used to train MLLT transforms.
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $cur_lda ark:- ark:-|"
else # do GMM update.
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
fi
rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
if [ $x -le $maxiterinc ]; then
numgauss=$[$numgauss+$incgauss];
fi
x=$[$x+1]
done
defaultfeats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
# Accumulate stats for "alignment model" which is as the model but with
# the unadapted, default features (shares Gaussian-level alignments).
( ali-to-post ark:$dir/cur.ali ark:- | \
gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$defaultfeats" ark:- $dir/$x.acc2 ) 2>$dir/acc_alimdl.log || exit 1;
# Update model.
gmm-est --write-occs=$dir/final.occs --remove-low-count-gaussians=false \
$dir/$x.mdl $dir/$x.acc2 $dir/$x.alimdl \
2>$dir/est_alimdl.log || exit 1;
rm $dir/$x.acc2
( cd $dir; rm final.mdl final.alimdl 2>/dev/null;
ln -s $x.mdl final.mdl; ln -s $x.alimdl final.alimdl
ln -s `basename $cur_lda` final.mat )

48
egs/rm/s1/steps/train_ubma.sh Executable file
Просмотреть файл

@ -0,0 +1,48 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# Train UBM from a trained HMM/GMM system.
if [ -f path.sh ]; then . path.sh; fi
dir=exp/ubma
mkdir -p $dir
srcdir=exp/tri1
init-ubm --intermediate-numcomps=2000 --ubm-numcomps=400 --verbose=2 \
--fullcov-ubm=true $srcdir/final.mdl $srcdir/final.occs \
$dir/0.ubm 2> $dir/cluster.log
subset[0]=1000
subset[1]=1500
subset[2]=2000
subset[3]=2500
for x in 0 1 2 3; do
echo "Pass $x"
feats="ark:scripts/subset_scp.pl ${subset[$x]} data/train.scp | add-deltas --print-args=false scp:- ark:- |"
fgmm-acc-stats --diag-gmm-nbest=15 --binary=false --verbose=2 $dir/$x.ubm "$feats" $dir/$x.acc \
2> $dir/acc.$x.log || exit 1;
fgmm-est --verbose=2 $dir/$x.ubm $dir/$x.acc \
$dir/$[$x+1].ubm 2> $dir/update.$x.log || exit 1;
rm $dir/$x.acc $dir/$x.ubm
done

8
egs/wsj/README.txt Normal file
Просмотреть файл

@ -0,0 +1,8 @@
Each subdirectory of this directory contains the
scripts for a sequence of experiments.
s1: This setup is experiments with GMM-based systems with various
Maximum Likelihood
techniques including global and speaker-specific transforms.
See a parallel setup in ../rm/s1

Просмотреть файл

@ -0,0 +1 @@
--use-energy=false # only non-default option.

Просмотреть файл

@ -0,0 +1,22 @@
<Topology>
<TopologyEntry>
<ForPhones>
NONSILENCEPHONES
</ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.75 <Transition> 1 0.25 </State>
<State> 1 <PdfClass> 1 <Transition> 1 0.75 <Transition> 2 0.25 </State>
<State> 2 <PdfClass> 2 <Transition> 2 0.75 <Transition> 3 0.25 </State>
<State> 3 </State>
</TopologyEntry>
<TopologyEntry>
<ForPhones>
SILENCEPHONES
</ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.25 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 </State>
<State> 1 <PdfClass> 1 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 2 <PdfClass> 2 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 3 <PdfClass> 3 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 4 <PdfClass> 4 <Transition> 4 0.25 <Transition> 5 0.75 </State>
<State> 5 </State>
</TopologyEntry>
</Topology>

Просмотреть файл

@ -0,0 +1,64 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# This program takes on its standard input a list of utterance
# id's, one for each line. (e.g. 4k0c030a is a an utterance id).
# It takes as
# Extracts from the dot files the transcripts for a given
# dataset (represented by a file list).
#
@ARGV == 1 || die "find_transcripts.pl dot_files_flist < utterance_ids > transcripts";
$dot_flist = shift @ARGV;
open(L, "<$dot_flist") || die "Opening file list of dot files: $dot_flist\n";
while(<L>){
chop;
m:\S+/(\w{6})00.dot: || die "Bad line in dot file list: $_";
$spk = $1;
$spk2dot{$spk} = $_;
}
while(<STDIN>){
chop;
$uttid = $_;
$uttid =~ m:(\w{6})\w\w: || die "Bad utterance id $_";
$spk = $1;
if($spk ne $curspk) {
%utt2trans = { }; # Don't keep all the transcripts in memory...
$curspk = $spk;
$dotfile = $spk2dot{$spk};
defined $dotfile || die "No dot file for speaker $spk\n";
open(F, "<$dotfile") || die "Error opening dot file $dotfile\n";
while(<F>) {
$_ =~ m:(.+)\((\w{8})\)\s*$: || die "Bad line $_ in dot file $dotfile (line $.)\n";
$trans = $1;
$utt = $2;
$utt2trans{$utt} = $trans;
}
}
if(!defined $utt2trans{$uttid}) {
print STDERR "No transcript for utterance $uttid (current dot file is $dotfile)\n";
} else {
print "$uttid $utt2trans{$uttid}\n";
}
}

Просмотреть файл

@ -0,0 +1,31 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# takes in a file list with lines like
# /mnt/matylda2/data/WSJ1/13-16.1/wsj1/si_dt_20/4k0/4k0c030a.wv1
# and outputs an scp in kaldi format with lines like
# 4k0c030a /mnt/matylda2/data/WSJ1/13-16.1/wsj1/si_dt_20/4k0/4k0c030a.wv1
# (the first thing is the utterance-id, which is the same as the basename of the file.
while(<>){
m:^\S+/(\w+)\.[wW][vV]1$: || die "Bad line $_";
$id = $1;
$id =~ tr/A-Z/a-z/; # Necessary because of weirdness on disk 13-16.1 (uppercase filenames)
print "$id $_";
}

Просмотреть файл

@ -0,0 +1,62 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# This program takes as its standard input an .ndx file from the WSJ corpus that looks
# like this:
#;; File: tr_s_wv1.ndx, updated 04/26/94
#;;
#;; Index for WSJ0 SI-short Sennheiser training data
#;; Data is read WSJ sentences, Sennheiser mic.
#;; Contains 84 speakers X (~100 utts per speaker MIT/SRI and ~50 utts
#;; per speaker TI) = 7236 utts
#;;
#11_1_1:wsj0/si_tr_s/01i/01ic0201.wv1
#11_1_1:wsj0/si_tr_s/01i/01ic0202.wv1
#11_1_1:wsj0/si_tr_s/01i/01ic0203.wv1
#and as command-line arguments it takes the names of the WSJ disk locations, e.g.:
#/mnt/matylda2/data/WSJ0/11-1.1 /mnt/matylda2/data/WSJ0/11-10.1 ... etc.
# It outputs a list of absolute pathnames (it does this by replacing e.g. 11_1_1 with
# /mnt/matylda2/data/WSJ0/11-1.1.
# It also does a slight fix because one of the WSJ disks (WSJ1/13-16.1) was distributed with
# uppercase rather than lower case filenames.
foreach $fn (@ARGV) {
$fn =~ m:.+/([0-9\.\-]+)/?$: || die "Bad command-line argument $fn\n";
$disk_id=$1;
$disk_id =~ tr/-\./__/; # replace - and . with - so 11-10.1 becomes 11_10_1
$fn =~ s:/$::; # Remove final slash, just in case it is present.
$disk2fn{$disk_id} = $fn;
}
while(<STDIN>){
if(m/^;/){ next; } # Comment. Ignore it.
else {
m/^([0-9_]+):\s*(\S+)$/ || die "Could not parse line $_";
$disk=$1;
if(!defined $disk2fn{$disk}) {
die "Disk id $disk not found";
}
$filename = $2; # as a subdirectory of the distributed disk.
if($disk eq "13_16_1" && `hostname` =~ m/fit.vutbr.cz/) {
# The disk 13-16.1 has been uppercased for some reason, on the
# BUT system. This is a fix specifically for that case.
$filename =~ tr/a-z/A-Z/; # This disk contains all uppercase filenames. Why?
}
print "$disk2fn{$disk}/$filename\n";
}
}

Просмотреть файл

@ -0,0 +1,57 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# This takes data from the standard input that's unnormalized transcripts in the format
# 4k2c0308 Of course there isn\'t any guarantee the company will keep its hot hand [misc_noise]
# 4k2c030a [loud_breath] And new hardware such as the set of personal computers I\. B\. M\. introduced last week can lead to unexpected changes in the software business [door_slam]
# and outputs normalized transcripts.
# c.f. /mnt/matylda2/data/WSJ0/11-10.1/wsj0/transcrp/doc/dot_spec.doc
@ARGV == 1 || die "usage: normalize_transcript.pl noise_word < transcript > transcript2";
$noise_word = shift @ARGV;
while(<STDIN>) {
$_ =~ m:^(\S+) (.+): || die "bad line $_";
$utt = $1;
$trans = $2;
print "$utt";
foreach $w (split (" ",$trans)) {
$w =~ tr:a-z:A-Z:; # Upcase everything to match the CMU dictionary. .
$w =~ s:\\::g; # Remove backslashes. We don't need the quoting.
if($w =~ m:^\[\<\w+\]$: || # E.g. [<door_slam], this means a door slammed in the preceding word. Delete.
$w =~ m:^\[\w+\>\]$: || # E.g. [door_slam>], this means a door slammed in the next word. Delete.
$w =~ m:\[\w+/\]$: || # E.g. [phone_ring/], which indicates the start of this phenomenon.
$w =~ m:\[\/\w+]$: || # E.g. [/phone_ring], which indicates the end of this phenomenon.
$w eq "~" || # This is used to indicate truncation of an utterance. Not a word.
$w eq ".") { # "." is used to indicate a pause. Silence is optional anyway so not much
# point including this in the transcript.
next; # we won't print this word.
} elsif($w =~ m:\[\w+\]:) { # Other noises, e.g. [loud_breath].
print " $noise_word";
} elsif($w =~ m:^\<([\w\']+)\>$:) {
# e.g. replace <and> with and. (the <> means verbal deletion of a word).. but it's pronounced.
print " $1";
} elsif($w eq "--DASH") {
print " -DASH"; # This is a common issue; the CMU dictionary has it as -DASH.
# } elsif($w =~ m:(.+)\-DASH$:) { # E.g. INCORPORATED-DASH... seems the DASH gets combined with previous word
# print " $1 -DASH";
} else {
print " $w";
}
}
print "\n";
}

54
egs/wsj/s1/data_prep/oov2unk.pl Executable file
Просмотреть файл

@ -0,0 +1,54 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# takes a transcript file with lines like
# 40po031e THE RATE FELL TO SIX %PERCENT IN NOVEMBER NINETEEN EIGHTY SIX .PERIOD
# on the standard input.
# The first (and only) command-line argument is the filename of a dictionary file with lines like
# ZYUGANOV Z Y UW1 G AA0 N AA0 V
# This file replaces all OOVs with the spoken-noise word and prints counts for each OOV on the standard error.
@ARGV == 2 || die "Usage: oov2unk.pl dict spoken-noise-word < transcript > transcript2";
$dict = shift @ARGV;
open(F, "<$dict") || die "Died opening dictionary file $dict\n";
while(<F>){
@A = split(" ", $_);
$word = shift @A;
$seen{$word} = 1;
}
$spoken_noise_word = shift @ARGV;
while(<STDIN>) {
@A = split(" ", $_);
$utt = shift @A;
print $utt;
foreach $a (@A) {
if(defined $seen{$a}) {
print " $a";
} else {
$oov{$a}++;
print " $spoken_noise_word";
}
}
print "\n";
}
foreach $w (sort { $oov{$a} <=> $oov{$b} } keys %oov) {
print STDERR "$w $oov{$w}\n";
}

157
egs/wsj/s1/data_prep/run.sh Executable file
Просмотреть файл

@ -0,0 +1,157 @@
# This script should be run from its own directory (.)
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# It takes as arguments a list of directories that should end
# with numbers like 13-4.1. These are the subdirectories in the WSJ disks.
# on the BUT system we can get these by doing:
# ./run.sh /mnt/matylda2/data/WSJ?/??-{?,??}.?
# Another example is:
# ./run.sh /ais/gobi2/speech/WSJ/*/??-{?,??}.?
if [ $# -lt 4 ]; then
echo "Too few arguments to run.sh: need a list of WSJ directories ending e.g. 11-13.1"
exit 1;
fi
rm -r links/ 2>/dev/null
mkdir links/
ln -s $* links
# This version for SI-84
cat links/11-13.1/wsj0/doc/indices/train/tr_s_wv1.ndx | \
./ndx2flist.pl $* | sort | \
grep -v 11-2.1/wsj0/si_tr_s/401 > train_si84.flist
# This version for SI-284
cat links/13-34.1/wsj1/doc/indices/si_tr_s.ndx \
links/11-13.1/wsj0/doc/indices/train/tr_s_wv1.ndx | \
./ndx2flist.pl $* | sort | \
grep -v 11-2.1/wsj0/si_tr_s/401 > train_si284.flist
# Now for the test sets.
# links/13-34.1/wsj1/doc/indices/readme.doc
# describes all the different test sets.
# Note: each test-set seems to come in multiple versions depending
# on different vocabulary sizes, verbalized vs. non-verbalized
# pronunciations, etc. We use the largest vocab and non-verbalized
# pronunciations.
# The most normal one seems to be the "baseline 60k test set", which
# is h1_p0.
# Nov'92 (333 utts)
# These index files have a slightly different format;
# have to add .wv1
cat links/11-13.1/wsj0/doc/indices/test/nvp/si_et_20.ndx | \
./ndx2flist.pl $* | awk '{printf("%s.wv1\n", $1)}' | \
sort > eval_nov92.flist
# Nov'93: (213 utts)
# Have to replace a wrong disk-id.
cat links/13-32.1/wsj1/doc/indices/wsj1/eval/h1_p0.ndx | \
sed s/13_32_1/13_33_1/ | \
./ndx2flist.pl $* | sort > eval_nov93.flist
# Dev-set for Nov'93 (503 utts)
cat links/13-34.1/wsj1/doc/indices/h1_p0.ndx | \
./ndx2flist.pl $* | sort > dev_nov93.flist
# Dev-set for Nov'93 (503 utts)
# links/13-34.1/wsj1/doc/indices/h1_p0.ndx
# Finding the transcript files:
for x in $*; do find $x -iname '*.dot'; done > dot_files.flist
# Convert the transcripts into our format (no normalization yet)
for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
./flist2scp.pl $x.flist | sort > ${x}_sph.scp
cat ${x}_sph.scp | awk '{print $1}' | ./find_transcripts.pl dot_files.flist > $x.trans1
done
# Do some initial normalization steps.
noiseword="<NOISE>";
for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
cat $x.trans1 | ./normalize_transcript.pl $noiseword > $x.trans2 || exit 1
done
if [ ! -f ../data/lexicon.txt ]; then
echo "You need to get ../data/lexicon.txt first (see ../run.sh)"
exit 1
fi
# Convert OOVs to <SPOKEN_NOISE>
spoken_noise_word="<SPOKEN_NOISE>";
for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
cat $x.trans2 | ./oov2unk.pl ../data/lexicon.txt $spoken_noise_word | sort > $x.txt || exit 1 # the .txt is the final transcript.
done
# Create scp's with wav's. (the wv1 in the distribution is not really wav, it is sph.)
sph2pipe=`cd ../../../..; echo $PWD/tools/sph2pipe_v2.5/sph2pipe`
if [ ! -f $sph2pipe ]; then
echo "Could not find the sph2pipe program at $sph2pipe";
exit 1;
fi
for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
awk '{printf("%s '$sph2pipe' -f wav %s |\n", $1, $2);}' < ${x}_sph.scp > ${x}_wav.scp
done
# The 20K vocab, open-vocabulary language model (i.e. the one with UNK), without
# verbalized pronunciations. This is the most common test setup, I understand.
cp links/13-32.1/wsj1/doc/lng_modl/base_lm/bcb20onp.z lm_bg.arpa.gz
chmod u+w lm_bg.arpa.gz
# trigram would be:
cat links/13-32.1/wsj1/doc/lng_modl/base_lm/tcb20onp.z | \
perl -e 'while(<>){ if(m/^\\data\\/){ print; last; } } while(<>){ print; }' | \
gzip -c -f > lm_tg.arpa.gz
export PATH=$PATH:../../../../tools/irstlm/bin
prune-lm --threshold=1e-7 lm_tg.arpa.gz lm_tg_pruned.arpa
gzip -f lm_tg_pruned.arpa
# Make the utt2spk and spk2utt files.
for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
cat ${x}_sph.scp | awk '{print $1}' | perl -ane 'chop; m:^...:; print "$_ $&\n";' > $x.utt2spk
cat $x.utt2spk | ../scripts/utt2spk_to_spk2utt.pl > $x.spk2utt
done
if [ ! -f wsj0-train-spkrinfo.txt ]; then
wget http://www.ldc.upenn.edu/Catalog/docs/LDC93S6A/wsj0-train-spkrinfo.txt
fi
if [ ! -f wsj0-train-spkrinfo.txt ]; then
echo "Could not get the spkrinfo.txt file from LDC website (moved)?"
echo "This is possibly omitted from the training disks; couldn't find it."
echo "Everything else may have worked; we just may be missing gender info"
echo "which is only needed for VTLN-related diagnostics anyway."
exit 1
fi
# Note: wsj0-train-spkrinfo.txt doesn't seem to be on the disks but the
# LDC put it on the web. Perhaps it was accidentally omitted from the
# disks. I put it in the repository.
cat links/11-13.1/wsj0/doc/spkrinfo.txt \
links/13-34.1/wsj1/doc/train/spkrinfo.txt \
./wsj0-train-spkrinfo.txt | \
perl -ane 'tr/A-Z/a-z/;print;' | grep -v ';' | \
awk '{print $1, $2}' > spk2gender.map

2
egs/wsj/s1/path.sh Executable file
Просмотреть файл

@ -0,0 +1,2 @@
export PATH=$PATH:../../../src/bin:../../../tools/openfst/bin:../../../src/fstbin/:../../../src/gmmbin/:../../../src/featbin/:../../../src/lm/
export LC_ALL=C

304
egs/wsj/s1/run.sh Normal file
Просмотреть файл

@ -0,0 +1,304 @@
#!/bin/bash
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
exit 1;
# This is a shell script, but it's recommended that you run the commands one by
# one by copying and pasting into the shell.
# Caution: some of the graph creation steps use quite a bit of memory, so you
# might want to run this script on a machine that has plenty of memory.
# (1) To get the CMU dictionary, do:
svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/
# got this at revision 10742 in my current test. can add -r 10742 for strict
# compatibility.
#(2) Dictionary preparation:
mkdir -p data
# Make phones symbol-table (adding in silence and verbal and non-verbal noises at this point).
# We are adding suffixes _B, _E, _S for beginning, ending, and singleton phones.
cat cmudict/cmudict.0.7a.symbols | perl -ane 's:\r::; print;' | \
awk 'BEGIN{print "<eps> 0"; print "SIL 1"; print "SPN 2"; print "NSN 3"; N=4; }
{printf("%s %d\n", $1, N++); }
{printf("%s_B %d\n", $1, N++); }
{printf("%s_E %d\n", $1, N++); }
{printf("%s_S %d\n", $1, N++); } ' >data/phones.txt
# First make a version of the lexicon without the silences etc, but with the position-markers.
# Remove the comments from the cmu lexicon and remove the (1), (2) from words with multiple
# pronunciations.
grep -v ';;;' cmudict/cmudict.0.7a | perl -ane 'if(!m:^;;;:){ s:(\S+)\(\d+\) :$1 :; print; }' \
| perl -ane '@A=split(" ",$_); $w = shift @A; @A>0||die;
if(@A==1) { print "$w $A[0]_S\n"; } else { print "$w $A[0]_B ";
for($n=1;$n<@A-1;$n++) { print "$A[$n] "; } print "$A[$n]_E\n"; } ' \
> data/lexicon_nosil.txt
# Add to cmudict the silences, noises etc.
(echo '!SIL SIL'; echo '<s> '; echo '</s> '; echo '<SPOKEN_NOISE> SPN'; echo '<UNK> SPN'; echo '<NOISE> NSN'; ) | \
cat - data/lexicon_nosil.txt > data/lexicon.txt
silphones="SIL SPN NSN";
# Generate colon-separated lists of silence and non-silence phones.
scripts/silphones.pl data/phones.txt "$silphones" data/silphones.csl data/nonsilphones.csl
# This adds disambig symbols to the lexicon and produces data/lexicon_disambig.txt
ndisambig=`scripts/add_lex_disambig.pl data/lexicon.txt data/lexicon_disambig.txt`
echo $ndisambig > data/lex_ndisambig
# Next, create a phones.txt file that includes the disambig symbols.
# the --include-zero includes the #0 symbol we pass through from the grammar.
scripts/add_disambig.pl --include-zero data/phones.txt $ndisambig > data/phones_disambig.txt
# Make the words symbol-table; add the disambiguation symbol #0 (we use this in place of epsilon
# in the grammar FST).
cat data/lexicon.txt | awk '{print $1}' | sort | uniq | \
awk 'BEGIN{print "<eps> 0";} {printf("%s %d\n", $1, NR);} END{printf("#0 %d\n", NR+1);} ' \
> data/words.txt
#(3)
# data preparation (this step requires the WSJ disks, from LDC).
# It takes as arguments a list of the directories ending in
# e.g. 11-13.1 (we don't assume a single root dir because
# there are different ways of unpacking them).
cd data_prep
#TODO: remove following system-specific comments.
#On BUT system, do:
./run.sh /mnt/matylda2/data/WSJ?/??-{?,??}.?
# On Geoff Hinton's system we can do:
# ./run.sh /ais/gobi2/speech/WSJ/*/??-{?,??}.?
cd ..
# Here is where we select what data to train on.
# use all the si284 data.
cp data_prep/train_si284_wav.scp data/train_wav.scp
cp data_prep/train_si284.txt data/train.txt
cp data_prep/train_si284.spk2utt data/train.spk2utt
cp data_prep/train_si284.utt2spk data/train.utt2spk
cp data_prep/spk2gender.map data/
for x in eval_nov92 dev_nov93 eval_nov93; do
cp data_prep/$x.spk2utt data/$x.spk2utt
cp data_prep/$x.utt2spk data/$x.utt2spk
cp data_prep/$x.txt data/$x.txt
done
for x in train eval_nov92 dev_nov93 eval_nov93; do
cat data/$x.txt | scripts/sym2int.pl --ignore-first-field data/words.txt > data/$x.tra
done
# Get the right paths on our system by sourcing the following shell file
# (edit it if it's not right for your setup).
. path.sh
# Create the basic L.fst without disambiguation symbols, for use
# in training.
scripts/make_lexicon_fst.pl data/lexicon.txt 0.5 SIL | \
fstcompile --isymbols=data/phones.txt --osymbols=data/words.txt \
--keep_isymbols=false --keep_osymbols=false | \
fstarcsort --sort_type=olabel > data/L.fst
# Create the lexicon FST with disambiguation symbols. There is an extra
# step where we create a loop "pass through" the disambiguation symbols
# from G.fst.
phone_disambig_symbol=`grep \#0 data/phones_disambig.txt | awk '{print $2}'`
word_disambig_symbol=`grep \#0 data/words.txt | awk '{print $2}'`
scripts/make_lexicon_fst.pl data/lexicon_disambig.txt 0.5 SIL | \
fstcompile --isymbols=data/phones_disambig.txt --osymbols=data/words.txt \
--keep_isymbols=false --keep_osymbols=false | \
fstaddselfloops "echo $phone_disambig_symbol |" "echo $word_disambig_symbol |" | \
fstarcsort --sort_type=olabel > data/L_disambig.fst
# Making the grammar FSTs
# This step is quite specific to this WSJ setup.
# see data_prep/run.sh for more about where these LMs came from.
steps/make_lm_fsts.sh
## Sanity check; just making sure the next command does not crash.
fstdeterminizestar data/G_bg.fst >/dev/null
## Sanity check; just making sure the next command does not crash.
fsttablecompose data/L_disambig.fst data/G_bg.fst | fstdeterminizestar >/dev/null
# At this point, make sure that "./exp/" is somewhere you can write
# a reasonably large amount of data (i.e. on a fast and large
# disk somewhere). It can be a soft link if necessary.
# (4) feature generation
# Make the training features.
# note that this runs 3-4 times faster if you compile with DEBUGLEVEL=0
# (this turns on optimization).
# Set "dir" to someplace you can write to.
dir=/mnt/matylda6/jhu09/qpovey/kaldi_wsj2_mfcc_e
steps/make_mfcc_train.sh $dir
steps/make_mfcc_test.sh $dir
# (5) running the training and testing steps..
steps/train_mono.sh || exit 1;
(scripts/mkgraph.sh --mono data/G_tg_pruned.fst exp/mono/tree exp/mono/final.mdl exp/graph_mono_tg_pruned || exit 1;
scripts/decode.sh exp/decode_mono_tgpr_eval92 exp/graph_mono_tg_pruned/HCLG.fst steps/decode_mono.sh data/eval_nov92.scp ) &
steps/train_tri1.sh || exit 1;
# add --no-queue --num-jobs 4 after "scripts/decode.sh" below, if you don't have
# qsub on your system. The number of jobs to use depends on how many CPUs and
# how much memory you have, on the local machine. If you do have qsub on your
# system, you will probably have to edit steps/decode.sh anyway to change the
# queue options... or if you have a different queueing system, you'd have to
# modify the script to use that.
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri1/tree exp/tri1/final.mdl exp/graph_tri1_tg_pruned || exit 1;
scripts/decode.sh exp/decode_tri1_tgpr_eval92 exp/graph_tri1_tg_pruned/HCLG.fst steps/decode_tri1.sh data/eval_nov92.scp ) &
steps/train_tri2a.sh || exit 1;
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2a/tree exp/tri2a/final.mdl exp/graph_tri2a_tg_pruned || exit 1;
scripts/decode.sh exp/decode_tri2a_tgpr_eval92 exp/graph_tri2a_tg_pruned/HCLG.fst steps/decode_tri2a.sh data/eval_nov92.scp
scripts/decode.sh exp/decode_tri2a_tgpr_eval93 exp/graph_tri2a_tg_pruned/HCLG.fst steps/decode_tri2a.sh data/eval_nov93.scp
scripts/decode.sh exp/decode_tri2a_tgpr_fmllr_utt_eval92 exp/graph_tri2a_tg_pruned/HCLG.fst steps/decode_tri2a_fmllr.sh data/eval_nov92.scp
scripts/decode.sh --per-spk exp/decode_tri2a_tgpr_fmllr_eval92 exp/graph_tri2a_tg_pruned/HCLG.fst steps/decode_tri2a_fmllr.sh data/eval_nov92.scp
) &
steps/train_tri3a.sh || exit 1;
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri3a/tree exp/tri3a/final.mdl exp/graph_tri3a_tg_pruned || exit 1;
scripts/decode.sh exp/decode_tri3a_tgpr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a.sh data/eval_nov92.scp
# per-speaker fMLLR
scripts/decode.sh --per-spk exp/decode_tri3a_tgpr_fmllr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a_fmllr.sh data/eval_nov92.scp
# per-utterance fMLLR
scripts/decode.sh exp/decode_tri3a_tgpr_uttfmllr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a_fmllr.sh data/eval_nov92.scp
# per-speaker diagonal fMLLR
scripts/decode.sh --per-spk exp/decode_tri3a_tgpr_dfmllr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a_diag_fmllr.sh data/eval_nov92.scp
# per-utterance diagonal fMLLR
scripts/decode.sh exp/decode_tri3a_tgpr_uttdfmllr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a_diag_fmllr.sh data/eval_nov92.scp
)&
# will delete:
## scripts/decode_queue_fmllr.sh exp/graph_tri3a_tg_pruned exp/tri3a/final.mdl exp/decode_tri3a_tg_pruned_fmllr &
#### Now alternative experiments... ###
# ET
steps/train_tri2b.sh
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2b/tree exp/tri2b/final.mdl exp/graph_tri2b_tg_pruned || exit 1;
scripts/decode.sh exp/decode_tri2b_tgpr_utt_eval92 exp/graph_tri2b_tg_pruned/HCLG.fst steps/decode_tri2b.sh data/eval_nov92.scp
scripts/decode.sh --per-spk exp/decode_tri2b_tgpr_eval92 exp/graph_tri2b_tg_pruned/HCLG.fst steps/decode_tri2b.sh data/eval_nov92.scp
scripts/decode.sh exp/decode_tri2b_tgpr_utt_fmllr_eval92 exp/graph_tri2b_tg_pruned/HCLG.fst steps/decode_tri2b_fmllr.sh data/eval_nov92.scp
scripts/decode.sh --per-spk exp/decode_tri2b_tgpr_fmllr_eval92 exp/graph_tri2b_tg_pruned/HCLG.fst steps/decode_tri2b_fmllr.sh data/eval_nov92.scp
) &
# MLLT/STC
steps/train_tri2d.sh
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2d/tree exp/tri2d/final.mdl exp/graph_tri2d_tg_pruned || exit 1;
scripts/decode.sh exp/decode_tri2d_tgpr_eval92 exp/graph_tri2d_tg_pruned/HCLG.fst steps/decode_tri2d.sh data/eval_nov92.scp )&
# Splice+LDA
steps/train_tri2e.sh
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2e/tree exp/tri2e/final.mdl exp/graph_tri2e_tg_pruned || exit 1;
scripts/decode.sh exp/decode_tri2e_tgpr_eval92 exp/graph_tri2e_tg_pruned/HCLG.fst steps/decode_tri2e.sh data/eval_nov92.scp )&
# Splice+LDA+MLLT
steps/train_tri2f.sh
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2f/tree exp/tri2f/final.mdl exp/graph_tri2f_tg_pruned || exit 1;
scripts/decode.sh exp/decode_tri2f_tgpr_eval92 exp/graph_tri2f_tg_pruned/HCLG.fst steps/decode_tri2f.sh data/eval_nov92.scp )&
# Linear VTLN (+ regular VTLN)
steps/train_tri2g.sh
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2g/tree exp/tri2g/final.mdl exp/graph_tri2g_tg_pruned || exit 1;
scripts/decode.sh exp/decode_tri2g_tgpr_utt_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g.sh data/eval_nov92.scp
scripts/decode.sh exp/decode_tri2g_tgpr_utt_diag_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g_diag.sh data/eval_nov92.scp
scripts/decode.sh --wav exp/decode_tri2g_tgpr_utt_vtln_diag_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g_vtln_diag.sh data/eval_nov92.scp
scripts/decode.sh --per-spk exp/decode_tri2g_tgpr_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g.sh data/eval_nov92.scp
scripts/decode.sh --per-spk exp/decode_tri2g_tgpr_diag_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g_diag.sh data/eval_nov92.scp
scripts/decode.sh --wav --per-spk exp/decode_tri2g_tgpr_vtln_diag_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g_vtln_diag.sh data/eval_nov92.scp
)&
# Splice+HLDA
steps/train_tri2h.sh
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2h/tree exp/tri2h/final.mdl exp/graph_tri2h_tg_pruned || exit 1;
scripts/decode.sh exp/decode_tri2h_tgpr_eval92 exp/graph_tri2h_tg_pruned/HCLG.fst steps/decode_tri2h.sh data/eval_nov92.scp )&
# Triple-deltas + HLDA
steps/train_tri2i.sh
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2i/tree exp/tri2i/final.mdl exp/graph_tri2i_tg_pruned || exit 1;
scripts/decode.sh exp/decode_tri2i_tgpr_eval92 exp/graph_tri2i_tg_pruned/HCLG.fst steps/decode_tri2i.sh data/eval_nov92.scp )&
# Splice + HLDA
steps/train_tri2j.sh
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2j/tree exp/tri2j/final.mdl exp/graph_tri2j_tg_pruned || exit 1;
scripts/decode.sh exp/decode_tri2j_tgpr_eval92 exp/graph_tri2j_tg_pruned/HCLG.fst steps/decode_tri2j.sh data/eval_nov92.scp )&
# LDA+ET
steps/train_tri2k.sh
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2k/tree exp/tri2k/final.mdl exp/graph_tri2k_tg_pruned || exit 1;
scripts/decode.sh exp/decode_tri2k_tgpr_utt_eval92 exp/graph_tri2k_tg_pruned/HCLG.fst steps/decode_tri2k.sh data/eval_nov92.scp
scripts/decode.sh --per-spk exp/decode_tri2k_tgpr_eval92 exp/graph_tri2k_tg_pruned/HCLG.fst steps/decode_tri2k.sh data/eval_nov92.scp
)&
# LDA+MLLT+SAT
steps/train_tri2l.sh
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2l/tree exp/tri2l/final.mdl exp/graph_tri2l_tg_pruned || exit 1;
scripts/decode.sh exp/decode_tri2l_tgpr_utt_eval92 exp/graph_tri2l_tg_pruned/HCLG.fst steps/decode_tri2l.sh data/eval_nov92.scp
scripts/decode.sh --per-spk exp/decode_tri2l_tgpr_eval92 exp/graph_tri2l_tg_pruned/HCLG.fst steps/decode_tri2l.sh data/eval_nov92.scp
)&
# Note on WERs at different stages of decoding:
#exp/decode_mono_tg_pruned/wer:%WER 31.82 [ 1795 / 5641, 109 ins, 412 del, 1274 sub ]
#exp/decode_tri1_tg_pruned/wer:%WER 13.61 [ 768 / 5641, 134 ins, 76 del, 558 sub ]
#exp/decode_tri2a_tg_pruned/wer:%WER 12.94 [ 730 / 5641, 131 ins, 62 del, 537 sub ]
#exp/decode_tri3a_tg_pruned/wer:%WER 10.88 [ 614 / 5641, 126 ins, 47 del, 441 sub ]
# For an e.g. of scoring with sclite: do e.g.
# scripts/score_sclite.sh exp/decode_tri2a_tg_pruned

Просмотреть файл

@ -0,0 +1,58 @@
#!/usr/bin/perl
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# Adds some specified number of disambig symbols to a symbol table.
# Adds these as #1, #2, etc.
# If the --include-zero option is specified, includes an extra one
# #0.
$include_zero = 0;
if($ARGV[0] eq "--include-zero") {
$include_zero = 1;
shift @ARGV;
}
if(@ARGV != 2) {
die "Usage: add_disambig.pl [--include-zero] symtab.txt num_extra > symtab_out.txt ";
}
$input = $ARGV[0];
$nsyms = $ARGV[1];
open(F, "<$input") || die "Opening file $input";
while(<F>) {
@A = split(" ", $_);
@A == 2 || die "Bad line $_";
$lastsym = $A[1];
print;
}
if(!defined($lastsym)){
die "Empty symbol file?";
}
if($include_zero) {
$lastsym++;
print "#0 $lastsym\n";
}
for($n = 1; $n <= $nsyms; $n++) {
$y = $n + $lastsym;
print "#$n $y\n";
}

Некоторые файлы не были показаны из-за слишком большого количества измененных файлов Показать больше