зеркало из https://github.com/mozilla/kaldi.git
Committing initial version of Kaldi
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@2 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
This commit is contained in:
Коммит
10e9002c88
|
@ -0,0 +1,270 @@
|
|||
|
||||
Legal Notices
|
||||
|
||||
Each of the files comprising Kaldi v1.0 have been separately licensed by
|
||||
their respective author(s) under the terms of the Apache License v 2.0 (set
|
||||
forth below). The source code headers for each file specifies the individual
|
||||
authors and source material for that file as well the corresponding copyright
|
||||
notice. For reference purposes only: A cumulative list of all individual
|
||||
contributors and original source material as well as the full text of the Apache
|
||||
License v 2.0 are set forth below.
|
||||
|
||||
Individual Contributors (in alphabetical order)
|
||||
|
||||
Mohit Agarwal
|
||||
Gilles Boulianne
|
||||
Lukas Burget
|
||||
Ondrej Glembek
|
||||
Arnab Ghoshal
|
||||
Go Vivace Inc.
|
||||
Mirko Hannemann
|
||||
Microsoft Corporation
|
||||
Petr Motlicek
|
||||
Ariya Rastrow
|
||||
Petr Schwarz
|
||||
Georg Stemmer
|
||||
Jan Silovsky
|
||||
Phonexia s.r.o.
|
||||
Yanmin Qian
|
||||
Karel Vesely
|
||||
Haihua Xu
|
||||
|
||||
Other Source Material
|
||||
|
||||
This project includes a port and modification of materials from JAMA: A Java
|
||||
Matrix Package under the following notice: "This software is a cooperative
|
||||
product of The MathWorks and the National Institute of Standards and Technology
|
||||
(NIST) which has been released to the public domain." This notice and the
|
||||
original code is available at http://math.nist.gov/javanumerics/jama/
|
||||
|
||||
This project includes a modified version of code published in Malvar, H.,
|
||||
"Signal processing with lapped transforms," Artech House, Inc., 1992. The
|
||||
current copyright holder, Henrique S. Malvar, has given his permission for the
|
||||
release of this modified version under the Apache License 2.0.
|
||||
|
||||
This file includes material from the OpenFST Library v1.2.7 available at
|
||||
http://www.openfst.org/twiki/bin/view/FST/WebHome and released under the
|
||||
Apache License v. 2.0.
|
||||
|
||||
[OpenFst COPYING file begins here]
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use these files except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
|
||||
Copyright 2005-2010 Google, Inc.
|
||||
|
||||
[OpenFst COPYING file ends here]
|
||||
|
||||
|
||||
-------------------------------------------------------------------------
|
||||
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright [yyyy] [name of copyright owner]
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
|
@ -0,0 +1,9 @@
|
|||
|
||||
[for native Windows install, see windows/INSTALL]
|
||||
|
||||
(1)
|
||||
go to tools/ and follow INSTALL instructions there.
|
||||
|
||||
(2)
|
||||
go to src/ and follow INSTALL instructions there.
|
||||
|
|
@ -0,0 +1,31 @@
|
|||
|
||||
This README has been created for those with whom we share the
|
||||
"pre-release" version of Kaldi. Although the toolkit has not
|
||||
been "officially" released, I have been given the OK to share
|
||||
it privately for "non-commercial purposes" (whatever that means).
|
||||
The official release is scheduled for mid-March.
|
||||
|
||||
The current version is not as polished as we would like, and contains
|
||||
some files that should eventually be deleted.
|
||||
|
||||
See http://merlin.fit.vutbr.cz/kaldi/ for documentation
|
||||
(may not always be fully up to date). This documentation
|
||||
is generated by running "doxygen" from the src/ directory,
|
||||
and appears in src/html/
|
||||
|
||||
I assume that the reader would like to (1) build the toolkit
|
||||
and (2) run the example system builds.
|
||||
|
||||
To build the toolkit: see ./INSTALL. These instructions are valid for UNIX
|
||||
systems including various flavors of Linux; Darwin; and Cygwin (has not been
|
||||
tested on more "exotic" varieties of UNIX). For Windows installation
|
||||
instructions (excluding Cygwin), see windows/INSTALL.
|
||||
|
||||
To run the example system builds, see egs/README.txt
|
||||
|
||||
If you encounter problems (and you probably will), your first point of contact
|
||||
should be Dan Povey (dpovey@microsoft.com). In addition to specific questions,
|
||||
please let me know if there are specific aspects of the project that you feel
|
||||
could be improved, that you find confusing, etc., and which missing features you
|
||||
most wish it had.
|
||||
|
|
@ -0,0 +1,21 @@
|
|||
|
||||
This directory contains example scripts that demonstrate how to
|
||||
use Kaldi. Each subdirectory corresponds to a corpus that we have
|
||||
example scripts for. Currently these are both corpora available from
|
||||
the Linguistic Data Consortium (LDC).
|
||||
|
||||
Explanations of the corpora are below:
|
||||
|
||||
wsj: The Wall Street Journal corpus. This is a corpus of read
|
||||
sentences from the Wall Street Journal, recorded under clean conditions.
|
||||
The vocabulary is quite large.
|
||||
Available from the LDC as either: [ catalog numbers LDC93S6A (WSJ0) and LDC94S13A (WSJ1) ]
|
||||
or: [ catalog numbers LDC93S6B (WSJ0) and LDC94S13B (WSJ1) ]
|
||||
The latter option is cheaper and includes only the Sennheiser
|
||||
microphone data (which is all we use in the example scripts).
|
||||
|
||||
rm: Resource Management. Clean speech in a medium-vocabulary task consisting
|
||||
of commands to a (presumably imaginary) computer system.
|
||||
Available from the LDC as catalog number LDC93S3A (it may be possible to
|
||||
get the same data using combinations of other catalog numbers, but this
|
||||
is the one we used).
|
|
@ -0,0 +1,8 @@
|
|||
|
||||
Each subdirectory of this directory contains the
|
||||
scripts for a sequence of experiments.
|
||||
|
||||
s1: This setup is experiments with GMM-based systems with various
|
||||
Maximum Likelihood
|
||||
techniques including global and speaker-specific transforms.
|
||||
See a parallel setup in ../wsj/s1
|
|
@ -0,0 +1,11 @@
|
|||
|
||||
Note RE decoding beams:
|
||||
|
||||
WER
|
||||
Beam 20 25 30
|
||||
monophone 18.28 28.24
|
||||
triphone 6.767 6.724 6.724 [tri1]
|
||||
Time [on svatava, xRT]
|
||||
triphone 0.13 0.27 0.43 [tri1]
|
||||
|
||||
|
|
@ -0,0 +1 @@
|
|||
--use-energy=false # only non-default option.
|
|
@ -0,0 +1,22 @@
|
|||
<Topology>
|
||||
<TopologyEntry>
|
||||
<ForPhones>
|
||||
NONSILENCEPHONES
|
||||
</ForPhones>
|
||||
<State> 0 <PdfClass> 0 <Transition> 0 0.75 <Transition> 1 0.25 </State>
|
||||
<State> 1 <PdfClass> 1 <Transition> 1 0.75 <Transition> 2 0.25 </State>
|
||||
<State> 2 <PdfClass> 2 <Transition> 2 0.75 <Transition> 3 0.25 </State>
|
||||
<State> 3 </State>
|
||||
</TopologyEntry>
|
||||
<TopologyEntry>
|
||||
<ForPhones>
|
||||
SILENCEPHONES
|
||||
</ForPhones>
|
||||
<State> 0 <PdfClass> 0 <Transition> 0 0.25 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 </State>
|
||||
<State> 1 <PdfClass> 1 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
|
||||
<State> 2 <PdfClass> 2 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
|
||||
<State> 3 <PdfClass> 3 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
|
||||
<State> 4 <PdfClass> 4 <Transition> 4 0.25 <Transition> 5 0.75 </State>
|
||||
<State> 5 </State>
|
||||
</TopologyEntry>
|
||||
</Topology>
|
|
@ -0,0 +1,69 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# usage: make_trans.sh prefix in.flist input.snr out.txt out.scp
|
||||
|
||||
# prefix is first letters of the database "key" (rest are numeric)
|
||||
|
||||
# in.flist is just a list of filenames, probably of .sph files.
|
||||
# input.snr is an snr format file from the RM dataset.
|
||||
# out.txt is the output transcriptions in format "key word1 word\n"
|
||||
# out.scp is the output scp file, which is as in.scp but has the
|
||||
# database-key first on each line.
|
||||
|
||||
# Reads from first argument e.g. $rootdir/rm1_audio1/rm1/doc/al_sents.snr
|
||||
# and second argument train_wav.scp
|
||||
# Writes to standard output trans.txt
|
||||
|
||||
if(@ARGV != 5) {
|
||||
die "usage: make_trans.sh prefix in.flist input.snr out.txt out.scp\n";
|
||||
}
|
||||
($prefix, $in_flist, $input_snr, $out_txt, $out_scp) = @ARGV;
|
||||
|
||||
open(F, "<$input_snr") || die "Opening SNOR file $input_snr";
|
||||
|
||||
while(<F>) {
|
||||
if(m/^;/) { next; }
|
||||
m/(.+) \((.+)\)/ || die "bad line $_";
|
||||
$T{$2} = $1;
|
||||
}
|
||||
|
||||
close(F);
|
||||
open(G, "<$in_flist") || die "Opening file list $in_flist";
|
||||
|
||||
open(O, ">$out_txt") || die "Open output transcription file $out_txt";
|
||||
|
||||
open(P, ">$out_scp") || die "Open output scp file $out_scp";
|
||||
|
||||
while(<G>) {
|
||||
$_ =~ m:/(\w+)/(\w+)\.sph\s+$:i || die "bad scp line $_";
|
||||
$spkname = $1;
|
||||
$uttname = $2;
|
||||
$uttname =~ tr/a-z/A-Z/;
|
||||
defined $T{$uttname} || die "no trans for sent $uttname";
|
||||
$spkname =~ s/_//g; # remove underscore from spk name to make key nicer.
|
||||
$key = $prefix . "_" . $spkname . "_" . $uttname;
|
||||
$key =~ tr/A-Z/a-z/; # Make it all lower case.
|
||||
# to make the numerical and string-sorted orders the same.
|
||||
print O "$key $T{$uttname}\n";
|
||||
print P "$key $_";
|
||||
$n++;
|
||||
}
|
||||
close(O) || die "Closing output.";
|
||||
close(P) || die "Closing output.";
|
||||
|
||||
|
|
@ -0,0 +1,92 @@
|
|||
# This script should be run from the directory where it is located (i.e. data_prep)
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# The input is the 3 CDs from the LDC distribution of Resource Management.
|
||||
# The script's argument is a directory which has three subdirectories:
|
||||
# rm1_audio1 rm1_audio2 rm2_audio
|
||||
|
||||
if [ $# != 1 ]; then
|
||||
echo "Usage: ./run.sh /path/to/RM"
|
||||
exit 1;
|
||||
fi
|
||||
|
||||
RMROOT=$1
|
||||
if [ ! -d $RMROOT/rm1_audio1 -o ! -d $RMROOT/rm1_audio2 ]; then
|
||||
echo "Error: run.sh requires a directory argument that contains rm1_audio1 and rm1_audio2"
|
||||
exit 1;
|
||||
fi
|
||||
|
||||
if [ ! -d $RMROOT/rm2_audio ]; then
|
||||
echo "**Warning: $RMROOT/rm2_audio does not exist; won't create spk2gender.map file correctly***"
|
||||
sleep 1
|
||||
fi
|
||||
|
||||
(
|
||||
find $RMROOT/rm1_audio1/rm1/ind_trn -iname '*.sph';
|
||||
find $RMROOT/rm1_audio2/2_4_2/rm1/ind/dev_aug -iname '*.sph';
|
||||
) | perl -ane ' m:/sa\d.sph:i || m:/sb\d\d.sph:i || print; ' > train_sph.flist
|
||||
|
||||
|
||||
|
||||
# make_trans.pl also creates the utterance id's and the kaldi-format scp file.
|
||||
./make_trans.pl trn train_sph.flist $RMROOT/rm1_audio1/rm1/doc/al_sents.snr train_trans.txt train_sph.scp
|
||||
mv train_trans.txt tmp; sort -k 1 tmp > train_trans.txt
|
||||
mv train_sph.scp tmp; sort -k 1 tmp > train_sph.scp
|
||||
|
||||
sph2pipe=`cd ../../../..; echo $PWD/tools/sph2pipe_v2.5/sph2pipe`
|
||||
if [ ! -f $sph2pipe ]; then
|
||||
echo "Could not find the sph2pipe program at $sph2pipe";
|
||||
exit 1;
|
||||
fi
|
||||
awk '{printf("%s '$sph2pipe' -f wav %s |\n", $1, $2);}' < train_sph.scp > train_wav.scp
|
||||
|
||||
cat train_wav.scp | perl -ane 'm/^(\w+_(\w+)\w_\w+) / || die; print "$1 $2\n"' > train.utt2spk
|
||||
cat train.utt2spk | sort -k 2 | ../scripts/utt2spk_to_spk2utt.pl > train.spk2utt
|
||||
|
||||
|
||||
for ntest in 1_mar87 2_oct87 4_feb89 5_oct89 6_feb91 7_sep92; do
|
||||
n=`echo $ntest | cut -d_ -f 1`
|
||||
test=`echo $ntest | cut -d_ -f 2`
|
||||
root=$RMROOT/rm1_audio2/2_4_2
|
||||
for x in `grep -v ';' $root/rm1/doc/tests/$ntest/${n}_indtst.ndx`; do
|
||||
echo "$root/$x ";
|
||||
done > test_${test}_sph.flist
|
||||
done
|
||||
|
||||
# make_trans.pl also creates the utterance id's and the kaldi-format scp file.
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
./make_trans.pl ${test} test_${test}_sph.flist $RMROOT/rm1_audio1/rm1/doc/al_sents.snr test_${test}_trans.txt test_${test}_sph.scp
|
||||
mv test_${test}_trans.txt tmp; sort -k 1 tmp > test_${test}_trans.txt
|
||||
mv test_${test}_sph.scp tmp; sort -k 1 tmp > test_${test}_sph.scp
|
||||
|
||||
awk '{printf("%s '$sph2pipe' -f wav %s |\n", $1, $2);}' < test_${test}_sph.scp > test_${test}_wav.scp
|
||||
|
||||
cat test_${test}_wav.scp | perl -ane 'm/^(\w+_(\w+)\w_\w+) / || die; print "$1 $2\n"' > test_${test}.utt2spk
|
||||
cat test_${test}.utt2spk | sort -k 2 | ../scripts/utt2spk_to_spk2utt.pl > test_${test}.spk2utt
|
||||
done
|
||||
|
||||
cat $RMROOT/rm1_audio2/2_5_1/rm1/doc/al_spkrs.txt \
|
||||
$RMROOT/rm2_audio/3-1.2/rm2/doc/al_spkrs.txt | \
|
||||
perl -ane 'tr/A-Z/a-z/;print;' | grep -v ';' | \
|
||||
awk '{print $1, $2}' > spk2gender.map
|
||||
|
||||
../scripts/make_rm_lm.pl $RMROOT/rm1_audio1/rm1/doc/wp_gram.txt > G.txt
|
||||
|
||||
# Getting lexicon
|
||||
../scripts/make_rm_dict.pl $RMROOT/rm1_audio2/2_4_2/score/src/rdev/pcdsril.txt > lexicon.txt
|
||||
|
||||
echo Succeeded.
|
|
@ -0,0 +1,39 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
fake=false
|
||||
if [ "$1" == "--fake" ]; then
|
||||
fake=true
|
||||
shift
|
||||
fi
|
||||
|
||||
sphdir=$1 # e.g. /mnt/matylda2/data/RM
|
||||
wavdir=$2 # e.g. /mnt/matylda6/jhu09/qpovey/kaldi_rm_wav
|
||||
flistin=$3 # e.g. train_sph.flist, contains sph files in sphdir
|
||||
flistout=$4 # e.g. train_wav.flist, contains wav files in wavdir
|
||||
|
||||
|
||||
if [ $fake == false ]; then
|
||||
for x in `cat $flistin`; do
|
||||
y=`echo $x | sed s:$sphdir:$wavdir: | sed s:.sph:.wav:`;
|
||||
mkdir -p `dirname $y`
|
||||
../../tools/sph2pipe_v2.5/sph2pipe -f wav $x $y || exit 1;
|
||||
done
|
||||
fi
|
||||
|
||||
cat $flistin | sed s:$sphdir:$wavdir: | sed s:.sph:.wav: > $flistout || exit 1;
|
||||
|
|
@ -0,0 +1 @@
|
|||
export PATH=$PATH:../../../src/bin:../../../tools/openfst/bin:../../../src/fstbin/:../../../src/gmmbin/:../../../src/featbin/:../../../src/fgmmbin:../../../src/sgmmbin
|
|
@ -0,0 +1,96 @@
|
|||
#!/bin/bash
|
||||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
exit 1 # Don't run this... it's to be run line by line from the shell.
|
||||
|
||||
# This script file cannot be run as-is; some paths in it need to be changed
|
||||
# before you can run it.
|
||||
# Search for /path/to.
|
||||
# It is recommended that you do not invoke this file from the shell, but
|
||||
# run the paths one by one, by hand.
|
||||
|
||||
# the step in data_prep/ will need to be modified for your system.
|
||||
|
||||
# First step is to do data preparation:
|
||||
# This just creates some text files, it is fast.
|
||||
# If not on the BUT system, you would have to change run.sh to reflect
|
||||
# your own paths.
|
||||
#
|
||||
|
||||
#Example arguments to run.sh: /mnt/matylda2/data/RM, /ais/gobi2/speech/RM, /cygdrive/e/data/RM
|
||||
# RM is a directory with subdirectories rm1_audio1, rm1_audio2, rm2_audio
|
||||
cd data_prep
|
||||
#*** You have to change the pathname below.***
|
||||
./run.sh /path/to/RM
|
||||
cd ..
|
||||
|
||||
|
||||
mkdir -p data
|
||||
( cd data; cp ../data_prep/{train,test*}.{spk2utt,utt2spk} . ; cp ../data_prep/spk2gender.map . )
|
||||
|
||||
# This next step converts the lexicon, grammar, etc., into FST format.
|
||||
steps/prepare_graphs.sh
|
||||
|
||||
|
||||
# Next, make sure that "exp/" is someplace you can write a significant amount of
|
||||
# data to (e.g. make it a link to a file on some reasonably large file system).
|
||||
# If it doesn't exist, the scripts below will make the directory "exp".
|
||||
|
||||
# tempdir should be set to some place to put training mfcc's
|
||||
# where you have space.
|
||||
#e.g.: tempdir=/mnt/matylda6/jhu09/qpovey/kaldi_rm_mfccb
|
||||
mfccdir=/path/to/mfccdir
|
||||
steps/make_mfcc_train.sh $mfccdir
|
||||
steps/make_mfcc_test.sh $mfccdir
|
||||
|
||||
steps/train_mono.sh
|
||||
steps/decode_mono.sh &
|
||||
steps/train_tri1.sh
|
||||
steps/decode_tri1.sh &
|
||||
|
||||
steps/train_tri2a.sh
|
||||
steps/decode_tri2a.sh &
|
||||
|
||||
# Then do the same for 2b, 2c, and so on
|
||||
# 2a = basic triphone (all features double-deltas unless stated).
|
||||
# 2b = exponential transform
|
||||
# 2c = mean normalization (cmn)
|
||||
# 2d = MLLT
|
||||
# 2e = splice-9-frames + LDA
|
||||
# 2f = splice-9-frames + LDA + MLLT
|
||||
# 2g = linear VTLN (+ regular VTLN); various decode scripts available.
|
||||
# 2h = splice-9-frames + HLDA
|
||||
# 2i = triple-deltas + HLDA
|
||||
# 2j = triple-deltas + LDA + MLLT
|
||||
# 2k = LDA + ET (equiv to LDA+MLLT+ET)
|
||||
|
||||
|
||||
# To train and test SGMM systems:
|
||||
|
||||
steps/train_ubma.sh
|
||||
|
||||
# train and test unadapted system
|
||||
steps/train_sgmma.sh
|
||||
steps/decode_sgmma.sh
|
||||
|
||||
# train and test system with speaker vectors.
|
||||
steps/train_sgmmb.sh
|
||||
steps/decode_sgmmb.sh
|
||||
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,58 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# Adds some specified number of disambig symbols to a symbol table.
|
||||
# Adds these as #1, #2, etc.
|
||||
# If the --include-zero option is specified, includes an extra one
|
||||
# #0.
|
||||
if(!(@ARGV == 2 || (@ARGV ==3 && $ARGV[0] eq "--include-zero"))) {
|
||||
die "Usage: add_disambig.pl [--include-zero] symtab.txt num_extra > symtab_out.txt ";
|
||||
}
|
||||
|
||||
if(@ARGV == 3) {
|
||||
$include_zero = 1;
|
||||
$ARGV[0] eq "--include-zero" || die "Bad option/first argument $ARGV[0]";
|
||||
shift @ARGV;
|
||||
} else {
|
||||
$include_zero = 0;
|
||||
}
|
||||
|
||||
$input = $ARGV[0];
|
||||
$nsyms = $ARGV[1];
|
||||
|
||||
open(F, "<$input") || die "Opening file $input";
|
||||
|
||||
while(<F>) {
|
||||
@A = split(" ", $_);
|
||||
@A == 2 || die "Bad line $_";
|
||||
$lastsym = $A[1];
|
||||
print;
|
||||
}
|
||||
|
||||
if(!defined($lastsym)){
|
||||
die "Empty symbol file?";
|
||||
}
|
||||
|
||||
if($include_zero) {
|
||||
$lastsym++;
|
||||
print "#0 $lastsym\n";
|
||||
}
|
||||
|
||||
for($n = 1; $n <= $nsyms; $n++) {
|
||||
$y = $n + $lastsym;
|
||||
print "#$n $y\n";
|
||||
}
|
|
@ -0,0 +1,101 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# Adds disambiguation symbols to a lexicon.
|
||||
# Outputs still in the normal lexicon format.
|
||||
# Disambig syms are numbered #1, #2, #3, etc. (#0
|
||||
# reserved for symbol in grammar).
|
||||
# Outputs the number of disambig syms to the standard output.
|
||||
|
||||
if(@ARGV != 2) {
|
||||
die "Usage: add_lex_disambig.pl lexicon.txt lexicon_disambig.txt "
|
||||
}
|
||||
|
||||
|
||||
$lexfn = shift @ARGV;
|
||||
$lexoutfn = shift @ARGV;
|
||||
|
||||
open(L, "<$lexfn") || die "Error opening lexicon $lexfn";
|
||||
|
||||
# (1) Read in the lexicon.
|
||||
@L = ( );
|
||||
while(<L>) {
|
||||
@A = split(" ", $_);
|
||||
push @L, join(" ", @A);
|
||||
}
|
||||
|
||||
# (2) Work out the count of each phone-sequence in the
|
||||
# lexicon.
|
||||
|
||||
foreach $l (@L) {
|
||||
@A = split(" ", $l);
|
||||
shift @A; # Remove word.
|
||||
$count{join(" ",@A)}++;
|
||||
}
|
||||
|
||||
# (3) For each left sub-sequence of each phone-sequence, note down
|
||||
# that exists (for identifying prefixes of longer strings).
|
||||
|
||||
foreach $l (@L) {
|
||||
@A = split(" ", $l);
|
||||
shift @A; # Remove word.
|
||||
while(@A > 0) {
|
||||
pop @A; # Remove last phone
|
||||
$issubseq{join(" ",@A)} = 1;
|
||||
}
|
||||
}
|
||||
|
||||
# (4) For each entry in the lexicon:
|
||||
# if the phone sequence is unique and is not a
|
||||
# prefix of another word, no diambig symbol.
|
||||
# Else output #1, or #2, #3, ... if the same phone-seq
|
||||
# has already been assigned a disambig symbol.
|
||||
|
||||
|
||||
open(O, ">$lexoutfn") || die "Opening lexicon file $lexoutfn for writing.\n";
|
||||
|
||||
$max_disambig = 0;
|
||||
foreach $l (@L) {
|
||||
@A = split(" ", $l);
|
||||
$word = shift @A;
|
||||
$phnseq = join(" ",@A);
|
||||
if(!defined $issubseq{$phnseq}
|
||||
&& $count{$phnseq}==1) {
|
||||
; # Do nothing.
|
||||
} else {
|
||||
if($phnseq eq "") { # need disambig symbols for the empty string
|
||||
# that are not use anywhere else.
|
||||
$max_disambig++;
|
||||
$reserved{$max_disambig} = 1;
|
||||
$phnseq = "#$max_disambig";
|
||||
} else {
|
||||
$curnumber = $disambig_of{$phnseq};
|
||||
if(!defined{$curnumber}) { $curnumber = 0; }
|
||||
$curnumber++; # now 1 or 2, ...
|
||||
while(defined $reserved{$curnumber} ) { $curnumber++; } # skip over reserved symbols
|
||||
if($curnumber > $max_disambig) {
|
||||
$max_disambig = $curnumber;
|
||||
}
|
||||
$disambig_of{$phnseq} = $curnumber;
|
||||
$phnseq = $phnseq . " #" . $curnumber;
|
||||
}
|
||||
}
|
||||
print O "$word\t$phnseq\n";
|
||||
}
|
||||
|
||||
print $max_disambig . "\n";
|
||||
|
|
@ -0,0 +1,40 @@
|
|||
#!/usr/bin/perl -w
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# This script takes a list of utterance-ids and filters an scp
|
||||
# file (or any file whose first field is an utterance id), printing
|
||||
# out only those lines whose first field is in id_list.
|
||||
|
||||
if(@ARGV < 1 || @ARGV > 2) {
|
||||
die "Usage: filter_scp.pl id_list [in.scp] > out.scp ";
|
||||
}
|
||||
|
||||
$idlist = shift @ARGV;
|
||||
open(F, "<$idlist") || die "Could not open id-list file $idlist";
|
||||
while(<F>) {
|
||||
@A = split;
|
||||
@A>=1 || die "Invalid id-list file line $_";
|
||||
$seen{$A[0]} = 1;
|
||||
}
|
||||
|
||||
while(<>) {
|
||||
@A = split;
|
||||
@A > 0 || die "Invalid scp file line $_";
|
||||
if($seen{$A[0]}) {
|
||||
print $_;
|
||||
}
|
||||
}
|
|
@ -0,0 +1,69 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
$ignore_noninteger = 0;
|
||||
$ignore_first_field = 0;
|
||||
for($x = 0; $x < 2; $x++) {
|
||||
if($ARGV[0] eq "--ignore-noninteger") { $ignore_oov = 1; shift @ARGV; }
|
||||
if($ARGV[0] eq "--ignore-first-field") { $ignore_first_field = 1; shift @ARGV; }
|
||||
}
|
||||
|
||||
$symtab = shift @ARGV;
|
||||
if(!defined $symtab) {
|
||||
die "Usage: sym2int.pl symtab [input transcriptions] > output transcriptions\n";
|
||||
}
|
||||
open(F, "<$symtab") || die "Error opening symbol table file $symtab";
|
||||
while(<F>) {
|
||||
@A = split(" ", $_);
|
||||
@A == 2 || die "bad line in symbol table file: $_";
|
||||
$int2sym{$A[1]} = $A[0];
|
||||
}
|
||||
|
||||
$error = 0;
|
||||
while(<>) {
|
||||
@A = split(" ", $_);
|
||||
if(@A == 0) {
|
||||
die "Empty line in transcriptions input.";
|
||||
}
|
||||
if($ignore_first_field) {
|
||||
$key = shift @A;
|
||||
print $key . " ";
|
||||
}
|
||||
foreach $a (@A) {
|
||||
if($a !~ m:^\d+$:) { # not all digits..
|
||||
if($ignore_noninteger) {
|
||||
print $a . " ";
|
||||
next;
|
||||
} else {
|
||||
if($a eq $A[0]) {
|
||||
die "int2sym.pl: found noninteger token $a (try --ignore-first-field)\n";
|
||||
} else {
|
||||
die "int2sym.pl: found noninteger token $a (try --ignore-noninteger if valid input)\n";
|
||||
}
|
||||
}
|
||||
}
|
||||
$s = $int2sym{$a};
|
||||
if(!defined ($s)) {
|
||||
die "int2sym.pl: integer $a not in symbol table $symtab.";
|
||||
}
|
||||
print $s . " ";
|
||||
}
|
||||
print "\n";
|
||||
}
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,45 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# Usage: is_sorted.sh [script-file]
|
||||
# This script returns 0 (success) if the script file argument [or standard input]
|
||||
# is sorted and 1 otherwise.
|
||||
|
||||
export LC_ALL=C
|
||||
|
||||
if [ $# == 0 ]; then
|
||||
scp=-
|
||||
fi
|
||||
if [ $# == 1 ]; then
|
||||
scp=$1
|
||||
fi
|
||||
if [ $# -gt 1 -o "$1" == "--help" -o "$1" == "-h" ]; then
|
||||
echo "Usage: is_sorted.sh [script-file]"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
cat $scp > /tmp/tmp1.$$
|
||||
sort /tmp/tmp1.$$ > /tmp/tmp2.$$
|
||||
cmp /tmp/tmp1.$$ /tmp/tmp2.$$ >/dev/null
|
||||
ret=$?
|
||||
rm /tmp/tmp1.$$ /tmp/tmp2.$$
|
||||
if [ $ret == 0 ]; then
|
||||
exit 0;
|
||||
else
|
||||
echo "is_sorted.sh: script file $scp is not sorted";
|
||||
exit 1;
|
||||
fi
|
|
@ -0,0 +1,112 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# makes lexicon FST (no pron-probs involved).
|
||||
|
||||
if(@ARGV != 1 && @ARGV != 3) {
|
||||
die "Usage: make_lexicon_fst.pl lexicon.txt [silprob silphone] > lexiconfst.txt"
|
||||
}
|
||||
|
||||
$lexfn = shift @ARGV;
|
||||
if(@ARGV == 0) {
|
||||
$silprob = 0.0;
|
||||
} else {
|
||||
($silprob,$silphone) = @ARGV;
|
||||
}
|
||||
if($silprob != 0.0) {
|
||||
$silprob < 1.0 || die "Sil prob cannot be >= 1.0";
|
||||
$silcost = -log($silprob);
|
||||
$nosilcost = -log(1.0 - $silprob);
|
||||
}
|
||||
|
||||
|
||||
open(L, "<$lexfn") || die "Error opening lexicon $lexfn";
|
||||
|
||||
|
||||
|
||||
if( $silprob == 0.0 ) { # No optional silences: just have one (loop+final) state which is numbered zero.
|
||||
$loopstate = 0;
|
||||
$nexststate = 1; # next unallocated state.
|
||||
while(<L>) {
|
||||
@A = split(" ", $_);
|
||||
$w = shift @A;
|
||||
if(@A == 0) { # For empty words (<s> and </s>) insert no optional
|
||||
# silence (not needed as adjacent words supply it)....
|
||||
# actually we only hit this case for the lexicon without disambig
|
||||
# symbols but doesn't ever matter as training transcripts don't have <s> or </s>.
|
||||
print "$loopstate\t$loopstate\t<eps>\t$w\n";
|
||||
} else {
|
||||
$s = $loopstate;
|
||||
$word_or_eps = $w;
|
||||
while (@A > 0) {
|
||||
$p = shift @A;
|
||||
if(@A > 0) {
|
||||
$ns = $nextstate++;
|
||||
} else {
|
||||
$ns = $loopstate;
|
||||
}
|
||||
print "$s\t$ns\t$p\t$word_or_eps\n";
|
||||
$word_or_eps = "<eps>";
|
||||
$s = $ns;
|
||||
}
|
||||
}
|
||||
}
|
||||
print "$loopstate\t0\n"; # final-cost.
|
||||
} else { # have silence probs.
|
||||
$startstate = 0;
|
||||
$loopstate = 1;
|
||||
$silstate = 2; # state from where we go to loopstate after emitting silence.
|
||||
$nextstate = 3;
|
||||
print "$startstate\t$loopstate\t<eps>\t<eps>\t$nosilcost\n"; # no silence.
|
||||
print "$startstate\t$loopstate\t$silphone\t<eps>\t$silcost\n"; # silence.
|
||||
print "$silstate\t$loopstate\t$silphone\t<eps>\n"; # no cost.
|
||||
while(<L>) {
|
||||
@A = split(" ", $_);
|
||||
$w = shift @A;
|
||||
if(@A == 0) { # For empty words (<s> and </s>) insert no optional
|
||||
# silence (not needed as adjacent words supply it)....
|
||||
# actually we only hit this case for the lexicon without disambig
|
||||
# symbols but doesn't ever matter as training transcripts don't have <s> or </s>.
|
||||
print "$loopstate\t$loopstate\t<eps>\t$w\n";
|
||||
} else {
|
||||
$is_silence_word = (@A == 1 && $A[0] eq $silphone); # boolean.
|
||||
$s = $loopstate;
|
||||
$word_or_eps = $w;
|
||||
while (@A > 0) {
|
||||
$p = shift @A;
|
||||
if(@A > 0) {
|
||||
$ns = $nextstate++;
|
||||
print "$s\t$ns\t$p\t$word_or_eps\n";
|
||||
$word_or_eps = "<eps>";
|
||||
$s = $ns;
|
||||
} else {
|
||||
if(! $is_silence_word) {
|
||||
# This is non-deterministic but relatively compact,
|
||||
# and avoids epsilons.
|
||||
print "$s\t$loopstate\t$p\t$word_or_eps\t$nosilcost\n";
|
||||
print "$s\t$silstate\t$p\t$word_or_eps\t$silcost\n";
|
||||
} else {
|
||||
# no point putting opt-sil after silence word.
|
||||
print "$s\t$loopstate\t$p\t$word_or_eps\n";
|
||||
}
|
||||
$word_or_eps = "<eps>";
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
print "$loopstate\t0\n"; # final-cost.
|
||||
}
|
|
@ -0,0 +1,37 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# make_phones_symtab.pl < lexicon.txt > phones.txt
|
||||
|
||||
|
||||
while(<>) {
|
||||
@A = split(" ", $_);
|
||||
for ($i=2; $i<@A; $i++) {
|
||||
$P{$A[$i]} = 1; # seen it.
|
||||
}
|
||||
}
|
||||
|
||||
print "<eps>\t0\n";
|
||||
$n = 1;
|
||||
foreach $p (sort keys %P) {
|
||||
if($p ne "<eps>") {
|
||||
print "$p\t$n\n";
|
||||
$n++;
|
||||
}
|
||||
}
|
||||
|
||||
print "sil\t$n\n";
|
||||
|
|
@ -0,0 +1,130 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Yanmin Qian Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# This file takes as input the file pcdsril.txt that comes with the RM
|
||||
# distribution, and creates the dictionary used in RM training.
|
||||
|
||||
# make_rm_dct.pl pcdsril.txt > dct.txt
|
||||
|
||||
if (@ARGV != 1) {
|
||||
die "usage: make_rm_dct.pl pcdsril.txt > dct.txt\n";
|
||||
}
|
||||
unless (open(IN_FILE, "@ARGV[0]")) {
|
||||
die ("can't open @ARGV[0]");
|
||||
}
|
||||
|
||||
while ($line = <IN_FILE>)
|
||||
{
|
||||
chop($line);
|
||||
if (($line =~ /^[a-z]/))
|
||||
{
|
||||
$line =~ s/\+1//g;
|
||||
@LineArray = split(/\s+/,$line);
|
||||
@LineArray[0] = uc(@LineArray[0]);
|
||||
|
||||
printf "%-16s", @LineArray[0];
|
||||
for ($i = 1; $i < @LineArray; $i ++)
|
||||
{
|
||||
if (@LineArray[$i] eq 'q')
|
||||
{}
|
||||
elsif (@LineArray[$i] eq 'zh')
|
||||
{
|
||||
printf "sh ";
|
||||
}
|
||||
elsif (@LineArray[$i] eq 'eng')
|
||||
{
|
||||
printf "ng ";
|
||||
}
|
||||
elsif (@LineArray[$i] eq 'hv')
|
||||
{
|
||||
printf "hh ";
|
||||
}
|
||||
elsif (@LineArray[$i] eq 'em')
|
||||
{
|
||||
printf "m ";
|
||||
}
|
||||
elsif (@LineArray[$i] eq 'axr')
|
||||
{
|
||||
printf "er ";
|
||||
}
|
||||
elsif (@LineArray[$i] eq 'tcl')
|
||||
{
|
||||
if (@LineArray[$i+1] ne 't')
|
||||
{
|
||||
printf "td ";
|
||||
}
|
||||
}
|
||||
elsif (@LineArray[$i] eq 'dcl')
|
||||
{
|
||||
if (@LineArray[$i+1] ne 'd')
|
||||
{
|
||||
printf "dd ";
|
||||
}
|
||||
}
|
||||
elsif (@LineArray[$i] eq 'kcl')
|
||||
{
|
||||
if (@LineArray[$i+1] ne 'k')
|
||||
{
|
||||
printf "kd ";
|
||||
}
|
||||
}
|
||||
elsif (@LineArray[$i] eq 'pcl')
|
||||
{
|
||||
if (@LineArray[$i+1] ne 'p')
|
||||
{
|
||||
printf "pd ";
|
||||
}
|
||||
}
|
||||
elsif (@LineArray[$i] eq 'bcl')
|
||||
{
|
||||
if (@LineArray[$i+1] ne 'b')
|
||||
{
|
||||
printf "b ";
|
||||
}
|
||||
}
|
||||
elsif (@LineArray[$i] eq 'gcl')
|
||||
{
|
||||
if (@LineArray[$i+1] ne 'g')
|
||||
{
|
||||
printf "g ";
|
||||
}
|
||||
}
|
||||
elsif (@LineArray[$i] eq 't')
|
||||
{
|
||||
if (@LineArray[$i+1] ne 's')
|
||||
{
|
||||
printf "@LineArray[$i] ";
|
||||
}
|
||||
else
|
||||
{
|
||||
printf "ts ";
|
||||
$i++;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
printf "@LineArray[$i] ";
|
||||
}
|
||||
}
|
||||
printf "\n";
|
||||
}
|
||||
}
|
||||
|
||||
printf "!SIL sil\n";
|
||||
|
||||
close(IN_FILE);
|
||||
|
||||
|
|
@ -0,0 +1,119 @@
|
|||
#!/usr/bin/perl
|
||||
|
||||
# Copyright 2010-2011 Yanmin Qian Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# This file takes as input the file wp_gram.txt that comes with the RM
|
||||
# distribution, and creates the language model as an acceptor in FST form.
|
||||
|
||||
# make_rm_lm.pl wp_gram.txt > G.txt
|
||||
|
||||
if (@ARGV != 1) {
|
||||
print "usage: make_rm_lm.pl wp_gram.txt > G.txt\n";
|
||||
exit(0);
|
||||
}
|
||||
unless (open(IN_FILE, "@ARGV[0]")) {
|
||||
die ("can't open @ARGV[0]");
|
||||
}
|
||||
|
||||
|
||||
$flag = 0;
|
||||
$count_wrd = 0;
|
||||
$cnt_ends = 0;
|
||||
$init = "";
|
||||
|
||||
while ($line = <IN_FILE>)
|
||||
{
|
||||
chop($line);
|
||||
|
||||
$line =~ s/ //g;
|
||||
|
||||
if(($line =~ /^>/))
|
||||
{
|
||||
if($flag == 0)
|
||||
{
|
||||
$flag = 1;
|
||||
}
|
||||
$line =~ s/>//g;
|
||||
$hashcnt{$init} = $i;
|
||||
$init = $line;
|
||||
$i = 0;
|
||||
$count_wrd++;
|
||||
@LineArray[$count_wrd - 1] = $init;
|
||||
$hashwrd{$init} = 0;
|
||||
}
|
||||
elsif($flag != 0)
|
||||
{
|
||||
|
||||
$hash{$init}[$i] = $line;
|
||||
$i++;
|
||||
if($line =~ /SENTENCE-END/)
|
||||
{
|
||||
$cnt_ends++;
|
||||
}
|
||||
}
|
||||
else
|
||||
{}
|
||||
}
|
||||
|
||||
$hashcnt{$init} = $i;
|
||||
|
||||
$num = 0;
|
||||
$weight = 0;
|
||||
$init_wrd = "SENTENCE-END";
|
||||
$hashwrd{$init_wrd} = @LineArray;
|
||||
for($i = 0; $i < $hashcnt{$init_wrd}; $i++)
|
||||
{
|
||||
$weight = -log(1/$hashcnt{$init_wrd});
|
||||
$hashwrd{$hash{$init_wrd}[$i]} = $i + 1;
|
||||
print "0 $hashwrd{$hash{$init_wrd}[$i]} $hash{$init_wrd}[$i] $hash{$init_wrd}[$i] $weight\n";
|
||||
}
|
||||
$num = $i;
|
||||
|
||||
for($i = 0; $i < @LineArray; $i++)
|
||||
{
|
||||
if(@LineArray[$i] eq 'SENTENCE-END')
|
||||
{}
|
||||
else
|
||||
{
|
||||
if($hashwrd{@LineArray[$i]} == 0)
|
||||
{
|
||||
$num++;
|
||||
$hashwrd{@LineArray[$i]} = $num;
|
||||
}
|
||||
for($j = 0; $j < $hashcnt{@LineArray[$i]}; $j++)
|
||||
{
|
||||
$weight = -log(1/$hashcnt{@LineArray[$i]});
|
||||
if($hashwrd{$hash{@LineArray[$i]}[$j]} == 0)
|
||||
{
|
||||
$num++;
|
||||
$hashwrd{$hash{@LineArray[$i]}[$j]} = $num;
|
||||
}
|
||||
if($hash{@LineArray[$i]}[$j] eq 'SENTENCE-END')
|
||||
{
|
||||
print "$hashwrd{@LineArray[$i]} $hashwrd{$hash{@LineArray[$i]}[$j]} <eps> <eps> $weight\n"
|
||||
}
|
||||
else
|
||||
{
|
||||
print "$hashwrd{@LineArray[$i]} $hashwrd{$hash{@LineArray[$i]}[$j]} $hash{@LineArray[$i]}[$j] $hash{@LineArray[$i]}[$j] $weight\n";
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
print "$hashwrd{$init_wrd} 0\n";
|
||||
close(IN_FILE);
|
||||
|
||||
|
|
@ -0,0 +1,102 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# Written by Dan Povey 9/21/2010. Apache 2.0 License.
|
||||
|
||||
# This version of make_roots.pl is specialized for RM.
|
||||
|
||||
# This script creates the file roots.txt which is an input to train-tree.cc. It
|
||||
# specifies how the trees are built. The input file phone-sets.txt is a partial
|
||||
# version of roots.txt in which phones are represented by their spelled form, not
|
||||
# their symbol id's. E.g. at input, phone-sets.txt might contain;
|
||||
# shared not-split sil
|
||||
# Any phones not specified in phone-sets.txt but present in phones.txt will
|
||||
# be given a default treatment. If the --separate option is given, we create
|
||||
# a separate tree root for each of them, otherwise they are all lumped in one set.
|
||||
# The arguments shared|not-shared and split|not-split are needed if any
|
||||
# phones are not specified in phone-sets.txt. What they mean is as follows:
|
||||
# if shared=="shared" then we share the tree-root between different HMM-positions
|
||||
# (0,1,2). If split=="split" then we actually do decision tree splitting on
|
||||
# that root, otherwise we forbid decision-tree splitting. (The main reason we might
|
||||
# set this to false is for silence when
|
||||
# we want to ensure that the HMM-positions will remain with a single PDF id.
|
||||
|
||||
|
||||
$separate = 0;
|
||||
if($ARGV[0] eq "--separate") {
|
||||
$separate = 1;
|
||||
shift @ARGV;
|
||||
}
|
||||
|
||||
if(@ARGV != 4) {
|
||||
die "Usage: make_roots.pl [--separate] phones.txt silence-phone-list[integer,colon-separated] shared|not-shared split|not-split > roots.txt\n";
|
||||
}
|
||||
|
||||
|
||||
($phonesfile, $silphones, $shared, $split) = @ARGV;
|
||||
if($shared ne "shared" && $shared ne "not-shared") {
|
||||
die "Third argument must be \"shared\" or \"not-shared\"\n";
|
||||
}
|
||||
if($split ne "split" && $split ne "not-split") {
|
||||
die "Third argument must be \"split\" or \"not-split\"\n";
|
||||
}
|
||||
|
||||
|
||||
|
||||
open(F, "<$phonesfile") || die "Opening file $phonesfile";
|
||||
|
||||
while(<F>) {
|
||||
@A = split(" ", $_);
|
||||
if(@A != 2) {
|
||||
die "Bad line in phones symbol file: ".$_;
|
||||
}
|
||||
if($A[1] != 0) {
|
||||
$symbol2id{$A[0]} = $A[1];
|
||||
$id2symbol{$A[1]} = $A[0];
|
||||
}
|
||||
}
|
||||
|
||||
if($silphones == ""){
|
||||
die "Empty silence phone list in make_roots.pl";
|
||||
}
|
||||
foreach $silphoneid (split(":", $silphones)) {
|
||||
defined $id2symbol{$silphoneid} || die "No such silence phone id $silphoneid";
|
||||
# Give each silence phone its own separate pdfs in each state, but
|
||||
# no sharing (in this recipe; WSJ is different.. in this recipe there
|
||||
#is only one silence phone anyway.)
|
||||
$issil{$silphoneid} = 1;
|
||||
print "not-shared not-split $silphoneid\n";
|
||||
}
|
||||
|
||||
$idlist = "";
|
||||
$remaining_phones = "";
|
||||
|
||||
if($separate){
|
||||
foreach $a (keys %id2symbol) {
|
||||
if(!defined $issil{$a}) {
|
||||
print "$shared $split $a\n";
|
||||
}
|
||||
}
|
||||
} else {
|
||||
print "$shared $split ";
|
||||
foreach $a (keys %id2symbol) {
|
||||
if(!defined $issil{$a}) {
|
||||
print "$a ";
|
||||
}
|
||||
}
|
||||
print "\n";
|
||||
}
|
|
@ -0,0 +1,39 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# make_words_symtab.pl < G.txt > words.txt
|
||||
|
||||
|
||||
|
||||
|
||||
while(<>) {
|
||||
@A = split(" ", $_);
|
||||
if(@A >= 3) {
|
||||
$W{$A[2]} = 1;
|
||||
}
|
||||
}
|
||||
|
||||
print "<eps>\t0\n";
|
||||
$n = 1;
|
||||
foreach $w (sort keys %W) {
|
||||
if($w ne "<eps>") {
|
||||
print "$w\t$n\n";
|
||||
$n++;
|
||||
}
|
||||
}
|
||||
|
||||
print "!SIL\t$n\n";
|
||||
|
|
@ -0,0 +1,107 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
reorder=true # Dan-style, make false for Mirko+Lukas's decoder.
|
||||
|
||||
for x in 1 2 3; do
|
||||
if [ $1 == "--mono" ]; then
|
||||
monophone_opts="--context-size=1 --central-position=0"
|
||||
shift;
|
||||
fi
|
||||
|
||||
if [ $1 == "--noreorder" ]; then
|
||||
reorder=false # we set this for the Kaldi decoder.
|
||||
shift;
|
||||
fi
|
||||
done
|
||||
|
||||
if [ $# != 3 ]; then
|
||||
echo "Usage: scripts/mkgraph.sh <tree> <model> <graphdir>"
|
||||
exit 1;
|
||||
fi
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
|
||||
tree=$1
|
||||
model=$2
|
||||
dir=$3
|
||||
|
||||
mkdir -p $dir
|
||||
|
||||
tscale=1.0
|
||||
loopscale=0.1
|
||||
|
||||
fsttablecompose data/L_disambig.fst data/G.fst | fstdeterminizestar --use-log=true | \
|
||||
fstminimizeencoded > $dir/LG.fst
|
||||
|
||||
fstisstochastic $dir/LG.fst || echo "warning: LG not stochastic."
|
||||
|
||||
echo "Example string from LG.fst: "
|
||||
echo
|
||||
fstrandgen --select=log_prob $dir/LG.fst | fstprint --isymbols=data/phones_disambig.txt --osymbols=data/words.txt -
|
||||
|
||||
grep '#' data/phones_disambig.txt | awk '{print $2}' > $dir/disambig_phones.list
|
||||
|
||||
fstcomposecontext $monophone_opts \
|
||||
--read-disambig-syms=$dir/disambig_phones.list \
|
||||
--write-disambig-syms=$dir/disambig_ilabels.list \
|
||||
$dir/ilabels < $dir/LG.fst >$dir/CLG.fst
|
||||
|
||||
# for debugging:
|
||||
fstmakecontextsyms data/phones.txt $dir/ilabels > $dir/context_syms.txt
|
||||
echo "Example string from CLG.fst: "
|
||||
echo
|
||||
fstrandgen --select=log_prob $dir/CLG.fst | fstprint --isymbols=$dir/context_syms.txt --osymbols=data/words.txt -
|
||||
|
||||
fstisstochastic $dir/CLG.fst || echo "warning: CLG not stochastic."
|
||||
|
||||
make-ilabel-transducer --write-disambig-syms=$dir/disambig_ilabels_remapped.list $dir/ilabels $tree $model $dir/ilabels.remapped > $dir/ilabel_map.fst
|
||||
|
||||
# Reduce size of CLG by remapping symbols...
|
||||
fsttablecompose $dir/ilabel_map.fst $dir/CLG.fst | fstdeterminizestar --use-log=true \
|
||||
| fstminimizeencoded > $dir/CLG2.fst
|
||||
|
||||
|
||||
cat $dir/CLG2.fst | fstisstochastic || echo "warning: CLG2 is not stochastic."
|
||||
|
||||
make-h-transducer --disambig-syms-out=$dir/disambig_tstate.list \
|
||||
--transition-scale=$tscale $dir/ilabels.remapped $tree $model > $dir/Ha.fst
|
||||
|
||||
|
||||
fsttablecompose $dir/Ha.fst $dir/CLG2.fst | fstdeterminizestar --use-log=true \
|
||||
| fstrmsymbols $dir/disambig_tstate.list | fstrmepslocal | fstminimizeencoded > $dir/HCLGa.fst
|
||||
|
||||
fstisstochastic $dir/HCLGa.fst || echo "HCLGa is not stochastic"
|
||||
|
||||
add-self-loops --self-loop-scale=$loopscale --reorder=$reorder $model < $dir/HCLGa.fst > $dir/HCLG.fst
|
||||
|
||||
if [ $tscale == 1.0 -a $loopscale == 1.0 ]; then
|
||||
# No point doing this test if transition-scale not 1, as it is bound to fail.
|
||||
fstisstochastic $dir/HCLG.fst || echo "Final HCLG is not stochastic."
|
||||
fi
|
||||
|
||||
fstisstochastic $dir/HCLG.fst || echo "Final HCLG is not stochastic."
|
||||
|
||||
|
||||
#The next five lines are debug.
|
||||
# The last two lines of this block print out some alignment info.
|
||||
fstrandgen --select=log_prob $dir/HCLG.fst | fstprint --osymbols=data/words.txt > $dir/rand.txt
|
||||
cat $dir/rand.txt | awk 'BEGIN{printf("0 ");} {if(NF>=3 && $3 != 0){ printf ("%d ",$3); }} END {print ""; }' > $dir/rand_align.txt
|
||||
|
||||
show-alignments data/phones.txt $model ark:$dir/rand_align.txt
|
||||
cat $dir/rand.txt | awk ' {if(NF>=4 && $4 != "<eps>"){ printf ("%s ",$4); }} END {print ""; }'
|
||||
|
|
@ -0,0 +1,115 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# This version of mkgraph.sh creates the C fst explicitly.
|
||||
|
||||
reorder=true # Dan-style, make false for Mirko+Lukas's decoder.
|
||||
|
||||
for x in 1 2 3; do
|
||||
if [ $1 == "--mono" ]; then
|
||||
monophone_opts="--context-size=1 --central-position=0"
|
||||
shift;
|
||||
fi
|
||||
|
||||
if [ $1 == "--noreorder" ]; then
|
||||
reorder=false # we set this for the Kaldi decoder.
|
||||
shift;
|
||||
fi
|
||||
done
|
||||
|
||||
if [ $# != 3 ]; then
|
||||
echo "Usage: scripts/mkgraph.sh <tree> <model> <graphdir>"
|
||||
exit 1;
|
||||
fi
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
|
||||
|
||||
tree=$1
|
||||
model=$2
|
||||
dir=$3
|
||||
|
||||
mkdir -p $dir
|
||||
|
||||
tscale=1.0
|
||||
loopscale=0.1
|
||||
|
||||
fsttablecompose data/L_disambig.fst data/G.fst | fstdeterminizestar --use-log=true | \
|
||||
fstminimizeencoded > $dir/LG.fst
|
||||
|
||||
fstisstochastic $dir/LG.fst || echo "warning: LG not stochastic."
|
||||
|
||||
echo "Example string from LG.fst: "
|
||||
echo
|
||||
fstrandgen --select=log_prob $dir/LG.fst | fstprint --isymbols=data/phones_disambig.txt --osymbols=data/words.txt -
|
||||
|
||||
grep '#' data/phones_disambig.txt | awk '{print $2}' > $dir/disambig_phones.list
|
||||
subseq_sym=`tail -1 data/phones_disambig.txt | awk '{print $2+1;}'`
|
||||
cp data/phones_disambig.txt $dir/phones_disambig_subseq.txt
|
||||
echo '$' $subseq_sym >> $dir/phones_disambig_subseq.txt
|
||||
|
||||
fstmakecontextfst --read-disambig-syms=$dir/disambig_phones.list \
|
||||
--write-disambig-syms=$dir/disambig_ilabels.list data/phones.txt $subseq_sym \
|
||||
$dir/ilabels | fstarcsort --sort_type=olabel > $dir/C.fst
|
||||
|
||||
fstaddsubsequentialloop $subseq_sym $dir/LG.fst | \
|
||||
fsttablecompose $dir/C.fst - > $dir/CLG.fst
|
||||
|
||||
|
||||
# for debugging:
|
||||
fstmakecontextsyms data/phones.txt $dir/ilabels > $dir/context_syms.txt
|
||||
echo "Example string from CLG.fst: "
|
||||
echo
|
||||
fstrandgen --select=log_prob $dir/CLG.fst | fstprint --isymbols=$dir/context_syms.txt --osymbols=data/words.txt -
|
||||
|
||||
fstisstochastic $dir/CLG.fst || echo "warning: CLG not stochastic."
|
||||
|
||||
make-ilabel-transducer --write-disambig-syms=$dir/disambig_ilabels_remapped.list $dir/ilabels $tree $model $dir/ilabels.remapped > $dir/ilabel_map.fst
|
||||
|
||||
# Reduce size of CLG by remapping symbols...
|
||||
fstcompose $dir/ilabel_map.fst $dir/CLG.fst | fstdeterminizestar --use-log=true \
|
||||
| fstminimizeencoded > $dir/CLG2.fst
|
||||
|
||||
|
||||
cat $dir/CLG2.fst | fstisstochastic || echo "warning: CLG2 is not stochastic."
|
||||
|
||||
make-h-transducer --disambig-syms-out=$dir/disambig_tstate.list \
|
||||
--transition-scale=$tscale $dir/ilabels.remapped $tree $model > $dir/Ha.fst
|
||||
|
||||
|
||||
fsttablecompose $dir/Ha.fst $dir/CLG2.fst | fstdeterminizestar --use-log=true \
|
||||
| fstrmsymbols $dir/disambig_tstate.list | fstrmepslocal | fstminimizeencoded > $dir/HCLGa.fst
|
||||
|
||||
fstisstochastic $dir/HCLGa.fst || echo "HCLGa is not stochastic"
|
||||
|
||||
add-self-loops --self-loop-scale=$loopscale --reorder=$reorder $model < $dir/HCLGa.fst > $dir/HCLG.fst
|
||||
|
||||
if [ $tscale == 1.0 -a $loopscale == 1.0 ]; then
|
||||
# No point doing this test if transition-scale not 1, as it is bound to fail.
|
||||
fstisstochastic $dir/HCLG.fst || echo "Final HCLG is not stochastic."
|
||||
fi
|
||||
|
||||
fstisstochastic $dir/HCLG.fst || echo "Final HCLG is not stochastic."
|
||||
|
||||
|
||||
#The next five lines are debug.
|
||||
# The last two lines of this block print out some alignment info.
|
||||
fstrandgen --select=log_prob $dir/HCLG.fst | fstprint --osymbols=data/words.txt > $dir/rand.txt
|
||||
cat $dir/rand.txt | awk 'BEGIN{printf("0 ");} {if(NF>=3 && $3 != 0){ printf ("%d ",$3); }} END {print ""; }' > $dir/rand_align.txt
|
||||
show-alignments data/phones.txt $model ark:$dir/rand_align.txt
|
||||
cat $dir/rand.txt | awk ' {if(NF>=4 && $4 != "<eps>"){ printf ("%s ",$4); }} END {print ""; }'
|
||||
|
|
@ -0,0 +1,47 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# This script is part of a diagnostic step when using exponential transforms.
|
||||
|
||||
$map=$ARGV[0]; open(M,"<$map")||die "opening map file $map";
|
||||
while(<M>){ @A=split(" ",$_); $map{$A[0]} = $A[1]; }
|
||||
while(<STDIN>){
|
||||
($spk,$warp)=split(" ",$_);
|
||||
$class = int($class/2);
|
||||
defined $map{$spk} || die "No gender info for speaker $spk";
|
||||
$warps{$map{$spk}} = $warps{$map{$spk}} . "$warp ";
|
||||
}
|
||||
@K = sort keys %warps;
|
||||
@K==2||die "wrong number of keys [empty warps file?]";
|
||||
foreach $k ( @K ) {
|
||||
$s = join(" ", sort { $a <=> $b } ( split(" ", $warps{$k}) )) ;
|
||||
print "$k = [ $s ];\n";
|
||||
}
|
||||
# f,m may be reversed below; doesnt matter.
|
||||
foreach $w ( split(" ", $warps{$K[0]}) ) {
|
||||
$nf += 1; $sumf += $w; $sumf2 += $w*$w;
|
||||
}
|
||||
foreach $w ( split(" ", $warps{$K[1]}) ) {
|
||||
$nm += 1; $summ += $w; $summ2 += $w*$w;
|
||||
}
|
||||
$sumf /= $nf; $sumf2 /= $nf;
|
||||
$summ /= $nm; $summ2 /= $nm;
|
||||
$sumf2 -= $sumf*$sumf;
|
||||
$summ2 -= $summ*$summ;
|
||||
$avgwithin = 0.5*($sumf2+$summ2 );
|
||||
$diff = abs($sumf - $summ) / sqrt($avgwithin);
|
||||
print "% class separation is $diff\n";
|
|
@ -0,0 +1,57 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# creates integer lists of silence and non-silence phones in files,
|
||||
# e.g. silphones.csl="1:2:3 \n"
|
||||
# and nonsilphones.csl="4:5:6:7:...:24\n";
|
||||
|
||||
if(@ARGV != 4) {
|
||||
die "Usage: silphones.pl phones.txt \"sil1 sil2 sil3\" silphones.csl nonsilphones.csl";
|
||||
}
|
||||
|
||||
($symtab, $sillist, $silphones, $nonsilphones) = @ARGV;
|
||||
open(S,"<$symtab") || die "Opening symbol table $symtab";
|
||||
|
||||
|
||||
foreach $s (split(" ", $sillist)) {
|
||||
$issil{$s} = 1;
|
||||
}
|
||||
|
||||
@sil = ();
|
||||
@nonsil = ();
|
||||
while(<S>){
|
||||
@A = split(" ", $_);
|
||||
@A == 2 || die "Bad line $_ in phone-symbol-table file $symtab";
|
||||
($sym, $int) = @A;
|
||||
if($int != 0) {
|
||||
if($issil{$sym}) { push @sil, $int; $seensil{$sym}=1; }
|
||||
else { push @nonsil, $int; }
|
||||
}
|
||||
}
|
||||
|
||||
foreach $k(keys %issil) {
|
||||
if(!$seensil{$k}) { die "No such silence phone $k"; }
|
||||
}
|
||||
open(F, ">$silphones") || die "opening silphones file $silphones";
|
||||
open(G, ">$nonsilphones") || die "opening nonsilphones file $nonsilphones";
|
||||
print F join(":", @sil) . "\n";
|
||||
print G join(":", @nonsil) . "\n";
|
||||
close(F);
|
||||
close(G);
|
||||
if(@sil == 0) { print STDERR "Warning: silphones.pl no silence phones.\n" }
|
||||
if(@nonsil == 0) { print STDERR "Warning: silphones.pl no non-silence phones.\n" }
|
||||
|
|
@ -0,0 +1,27 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
while(<>){
|
||||
@A = split(" ", $_);
|
||||
@A > 1 || die "Invalid line in spk2utt file: $_";
|
||||
$s = shift @A;
|
||||
foreach $u ( @A ) {
|
||||
print "$u $s\n";
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,181 @@
|
|||
#!/usr/bin/perl -w
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
|
||||
# This program splits up any kind of .scp or archive-type file.
|
||||
# If there is no utt2spk option it will work on any text file and
|
||||
# will split it up with an approximately equal number of lines in
|
||||
# each but.
|
||||
# With the --utt2spk option it will work on anything that has the
|
||||
# utterance-id as the first entry on each line; the utt2spk file is
|
||||
# of the form "utterance speaker" (on each line).
|
||||
# It splits it into equal size chunks as far as it can. If you use
|
||||
# the utt2spk option it will make sure these chunks coincide with
|
||||
# speaker boundaries. In this case, if there are more chunks
|
||||
# than speakers (and in some other circumstances), some of the
|
||||
# resulting chunks will be empty and it
|
||||
# will print a warning.
|
||||
# You will normally call this like:
|
||||
# split_scp.pl scp scp.1 scp.2 scp.3 ...
|
||||
# or
|
||||
# split_scp.pl --utt2spk=utt2spk scp scp.1 scp.2 scp.3 ...
|
||||
# Note that you can use this script to split the utt2spk file itself,
|
||||
# e.g. split_scp.pl --utt2spk=utt2spk utt2spk utt2spk.1 utt2spk.2 ...
|
||||
|
||||
if(@ARGV < 2 ) {
|
||||
die "Usage: split_scp.pl [--utt2spk=<utt2spk_file>] in.scp out1.scp out2.scp ... ";
|
||||
}
|
||||
|
||||
if($ARGV[0] =~ m:^-:) {
|
||||
# Everything inside this block
|
||||
# corresponds to what we do when the --utt2spk option is used.
|
||||
$opt = shift @ARGV;
|
||||
@A = split("=", $opt);
|
||||
if(@A != 2 || $A[0] ne "--utt2spk") {
|
||||
die "split_scp.pl: invalid option $ARGV[0]";
|
||||
}
|
||||
$utt2spk_file = $A[1];
|
||||
open(U, "<$utt2spk_file") || die "Failed to open utt2spk file $utt2spk_file";
|
||||
while(<U>) {
|
||||
@A = split;
|
||||
@A == 2 || die "Bad line $_ in utt2spk file $utt2spk_file";
|
||||
($u,$s) = @A;
|
||||
$utt2spk{$u} = $s;
|
||||
}
|
||||
$inscp = shift @ARGV;
|
||||
open(I, "<$inscp") || die "Opening input scp file $inscp";
|
||||
@spkrs = ();
|
||||
while(<I>) {
|
||||
@A = split;
|
||||
if(@A == 0) { die "Empty or space-only line in scp file $inscp"; }
|
||||
$u = $A[0];
|
||||
$s = $utt2spk{$u};
|
||||
if(!defined $s) { die "No such utterance $u in utt2spk file $utt2spk_file"; }
|
||||
if(!defined $spk_count{$s}) {
|
||||
push @spkrs, $s;
|
||||
$spk_count{$s} = 0;
|
||||
$spk_data{$s} = "";
|
||||
}
|
||||
$spk_count{$s}++;
|
||||
$spk_data{$s} = $spk_data{$s} . $_;
|
||||
}
|
||||
# Now split as equally as possible ..
|
||||
# First allocate spks to files by given approximately
|
||||
# equal #spks.
|
||||
$numspks = @spkrs; # number of speakers.
|
||||
$numscps = @ARGV; # number of output files.
|
||||
$spksperscp = int( ($numspks+($numscps-1)) / $numscps); # the +$(numscps-1) forces rounding up.
|
||||
for($scpidx = 0; $scpidx < $numscps; $scpidx++) {
|
||||
$scparray[$scpidx] = []; # [] is array reference.
|
||||
for($n = $spksperscp * $scpidx;
|
||||
$n < $numspks && $n < $spksperscp*($scpidx+1);
|
||||
$n++) {
|
||||
$spk = $spkrs[$n];
|
||||
push @{$scparray[$scpidx]}, $spk;
|
||||
$scpcount[$scpidx] += $spk_count{$spk};
|
||||
}
|
||||
}
|
||||
# Now will try to reassign beginning + ending speakers
|
||||
# to different scp's and see if it gets more balanced.
|
||||
# Suppose objf we're minimizing is sum_i (num utts in scp[i] - average)^2.
|
||||
# We can show that if considering changing just 2 scp's, we minimize
|
||||
# this by minimizing the squared difference in sizes. This is
|
||||
# equivalent to minimizing the absolute difference in sizes. This
|
||||
# shows this method is bound to converge.
|
||||
|
||||
$changed = 1;
|
||||
while($changed) {
|
||||
$changed = 0;
|
||||
for($scpidx = 0; $scpidx < $numscps; $scpidx++) {
|
||||
# First try to reassign ending spk of this scp.
|
||||
if($scpidx < $numscps-1) {
|
||||
$sz = @{$scparray[$scpidx]};
|
||||
if($sz > 0) {
|
||||
$spk = $scparray[$scpidx]->[$sz-1];
|
||||
$count = $spk_count{$spk};
|
||||
$nutt1 = $scpcount[$scpidx];
|
||||
$nutt2 = $scpcount[$scpidx+1];
|
||||
if( abs( ($nutt2+$count) - ($nutt1-$count))
|
||||
< abs($nutt2 - $nutt1)) { # Would decrease
|
||||
# size-diff by reassigning spk...
|
||||
$scpcount[$scpidx+1] += $count;
|
||||
$scpcount[$scpidx] -= $count;
|
||||
pop @{$scparray[$scpidx]};
|
||||
unshift @{$scparray[$scpidx+1]}, $spk;
|
||||
$changed = 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
if($scpidx > 0 && @{$scparray[$scpidx]} > 0) {
|
||||
$spk = $scparray[$scpidx]->[0];
|
||||
$count = $spk_count{$spk};
|
||||
$nutt1 = $scpcount[$scpidx-1];
|
||||
$nutt2 = $scpcount[$scpidx];
|
||||
if( abs( ($nutt2-$count) - ($nutt1+$count))
|
||||
< abs($nutt2 - $nutt1)) { # Would decrease
|
||||
# size-diff by reassigning spk...
|
||||
$scpcount[$scpidx-1] += $count;
|
||||
$scpcount[$scpidx] -= $count;
|
||||
shift @{$scparray[$scpidx]};
|
||||
push @{$scparray[$scpidx-1]}, $spk;
|
||||
$changed = 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
# Now print out the files...
|
||||
for($scpidx = 0; $scpidx < $numscps; $scpidx++) {
|
||||
$scpfn = $ARGV[$scpidx];
|
||||
open(F, ">$scpfn") || die "Could not open scp file $scpfn for writing.";
|
||||
$count = 0;
|
||||
if(@{$scparray[$scpidx]} == 0) {
|
||||
print STDERR "Warning: split_scp.pl producing empty .scp file $scpfn (too many splits and too few speakers?)";
|
||||
}
|
||||
foreach $spk ( @{$scparray[$scpidx]} ) {
|
||||
print F $spk_data{$spk};
|
||||
$count += $spk_count{$spk};
|
||||
}
|
||||
if($count != $scpcount[$scpidx]) { die "Count mismatch [code error]"; }
|
||||
close(F);
|
||||
}
|
||||
} else {
|
||||
# This block is the "normal" case where there is no --utt2spk
|
||||
# option and we just break into equal size chunks.
|
||||
|
||||
$inscp = shift @ARGV;
|
||||
open(I, "<$inscp") || die "Opening input scp file $inscp";
|
||||
|
||||
$numscps = @ARGV; # size of array.
|
||||
@F = ();
|
||||
while(<I>) {
|
||||
push @F, $_;
|
||||
}
|
||||
$numlines = @F;
|
||||
if($numlines == 0) {
|
||||
print STDERR "split_scp.pl: warning: empty input scp file $inscp";
|
||||
}
|
||||
$linesperscp = int( ($numlines+($numscps-1)) / $numscps); # the +$(numscps-1) forces rounding up.
|
||||
# [just doing int() rounds down].
|
||||
for($scpidx = 0; $scpidx < @ARGV; $scpidx++) {
|
||||
$scpfile = $ARGV[$scpidx];
|
||||
open(O, ">$scpfile") || die "Opening output scp file $scpfile";
|
||||
for($n = $linesperscp * $scpidx; $n < $numlines && $n < $linesperscp*($scpidx+1); $n++) {
|
||||
print O $F[$n];
|
||||
}
|
||||
close(O) || die "Closing scp file $scpfile";
|
||||
}
|
||||
}
|
|
@ -0,0 +1,59 @@
|
|||
#!/usr/bin/perl -w
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# This program selects a subset of N elements in the scp.
|
||||
# It selects them evenly from throughout the scp, in order to
|
||||
# avoid selecting too many from the same speaker.
|
||||
# It prints them on the standard output.
|
||||
|
||||
if(@ARGV < 2 ) {
|
||||
die "Usage: subset_scp.pl N in.scp ";
|
||||
}
|
||||
|
||||
$N = shift @ARGV;
|
||||
if($N == 0) {
|
||||
die "First command-line parameter to subset_scp.pl must be an integer, got \"$N\"";
|
||||
}
|
||||
$inscp = shift @ARGV;
|
||||
open(I, "<$inscp") || die "Opening input scp file $inscp";
|
||||
|
||||
@F = ();
|
||||
while(<I>) {
|
||||
push @F, $_;
|
||||
}
|
||||
$numlines = @F;
|
||||
if($N > $numlines) {
|
||||
die "You requested from subset_scp.pl more elements than available: $N > $numlines";
|
||||
}
|
||||
|
||||
sub select_n {
|
||||
my ($start,$end,$num_needed) = @_;
|
||||
my $diff = $end - $start;
|
||||
if($num_needed > $diff) { die "select_n: code error"; }
|
||||
if($diff == 1 ) {
|
||||
if($num_needed > 0) {
|
||||
print $F[$start];
|
||||
}
|
||||
} else {
|
||||
my $halfdiff = int($diff/2);
|
||||
my $halfneeded = int($num_needed/2);
|
||||
select_n($start, $start+$halfdiff, $halfneeded);
|
||||
select_n($start+$halfdiff, $end, $num_needed - $halfneeded);
|
||||
}
|
||||
}
|
||||
select_n(0, $numlines, $N);
|
||||
|
|
@ -0,0 +1,59 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
$ignore_oov = 0;
|
||||
$ignore_first_field = 0;
|
||||
for($x = 0; $x < 2; $x++) {
|
||||
if($ARGV[0] eq "--ignore-oov") { $ignore_oov = 1; shift @ARGV; }
|
||||
if($ARGV[0] eq "--ignore-first-field") { $ignore_first_field = 1; shift @ARGV; }
|
||||
}
|
||||
|
||||
$symtab = shift @ARGV;
|
||||
if(!defined $symtab) {
|
||||
die "Usage: sym2int.pl symtab [input transcriptions] > output transcriptions\n";
|
||||
}
|
||||
open(F, "<$symtab") || die "Error opening symbol table file $symtab";
|
||||
while(<F>) {
|
||||
@A = split(" ", $_);
|
||||
@A == 2 || die "bad line in symbol table file: $_";
|
||||
$sym2int{$A[0]} = $A[1] + 0;
|
||||
}
|
||||
|
||||
while(<>) {
|
||||
@A = split(" ", $_);
|
||||
if(@A == 0) {
|
||||
die "Empty line in transcriptions input.";
|
||||
}
|
||||
if($ignore_first_field) {
|
||||
$key = shift @A;
|
||||
print $key . " ";
|
||||
}
|
||||
foreach $a (@A) {
|
||||
$i = $sym2int{$a};
|
||||
if(!defined ($i)) {
|
||||
if($ignore_oov) {
|
||||
print $a . " " ;
|
||||
} else {
|
||||
die "sym2int.pl: undefined symbol $a\n";
|
||||
}
|
||||
}
|
||||
print $i . " ";
|
||||
}
|
||||
print "\n";
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,33 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
|
||||
while(<>){
|
||||
@A = split(" ", $_);
|
||||
@A == 2 || die "Invalid line in utt2spk file: $_";
|
||||
($u,$s) = @A;
|
||||
if(!$seen_spk{$s}) {
|
||||
$seen_spk{$s} = 1;
|
||||
push @spklist, $s;
|
||||
}
|
||||
$uttlist{$s} = $uttlist{$s} . "$u ";
|
||||
}
|
||||
foreach $s (@spklist) {
|
||||
$l = $uttlist{$s};
|
||||
$l =~ s: $::; # remove trailing space.
|
||||
print "$s $l\n";
|
||||
}
|
|
@ -0,0 +1,45 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# Monophone decoding script.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_mono
|
||||
tree=exp/mono/tree
|
||||
mkdir -p $dir
|
||||
model=exp/mono/final.mdl
|
||||
graphdir=exp/graph_mono
|
||||
|
||||
scripts/mkgraph.sh --mono $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
||||
|
|
@ -0,0 +1,45 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# SGMM decoding; use a different acoustic scale from normal (0.1 vs 0.08333)
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_sgmm
|
||||
tree=exp/sgmm/tree
|
||||
model=exp/sgmm/final.mdl
|
||||
graphdir=exp/graph_sgmm
|
||||
|
||||
mkdir -p $dir
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
sgmm-decode-faster --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,54 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# SGMM decoding; use a different acoustic scale from normal (0.1 vs 0.08333)
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_sgmm2
|
||||
tree=exp/sgmm/tree
|
||||
model=exp/sgmm2/final.mdl
|
||||
graphdir=exp/graph_sgmm2
|
||||
|
||||
mkdir -p $dir
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
sgmm-gselect $model "$feats" ark,t:- 2>$dir/gselect_${test}.log | gzip -c > $dir/gselect_${test}.gz || exit 1;
|
||||
gselect_opt="--gselect-read=ark:gunzip -c $dir/gselect_${test}.gz|"
|
||||
sgmm-decode-faster-spkvecs --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt "$gselect_opt" $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log || exit 1;
|
||||
ali-to-post $dir/test_${test}.ali $dir/test_${test}.post 2> $dir/post_${test}.log || exit 1;
|
||||
|
||||
gselect_opt="--gselect=ark:gunzip -c $dir/gselect_${test}.gz|"
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
sgmm-est-spkvecs "$gselect_opt" --spk2utt= $model "$feats" $dir/test_${test}.post $dir/vecs_${test} 2> $dir/est_spkvecs_${test}.log || exit 1;
|
||||
sgmm-decode-faster-spkvecs --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt "$gselect_opt" --spkvecs-read=$dir/vecs_${test} $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_vecs_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,45 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# SGMM decoding; use a different acoustic scale from normal (0.1 vs 0.08333)
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_sgmma
|
||||
tree=exp/sgmma/tree
|
||||
model=exp/sgmma/final.mdl
|
||||
graphdir=exp/graph_sgmma
|
||||
|
||||
mkdir -p $dir
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
sgmm-decode-faster --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,69 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# SGMM decoding with adaptation.
|
||||
#
|
||||
# SGMM decoding; use a different acoustic scale from normal (0.1 vs 0.08333)
|
||||
# (1) decode with "alignment model"
|
||||
# (2) get GMM posteriors with "alignment model" and estimate speaker
|
||||
# vectors with final model
|
||||
# (3) decode with final model.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_sgmmb
|
||||
tree=exp/sgmmb/tree
|
||||
model=exp/sgmmb/final.mdl
|
||||
alimodel=exp/sgmmb/final.alimdl
|
||||
graphdir=exp/graph_sgmmb
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
|
||||
mkdir -p $dir
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
|
||||
spk2utt_opt="--spk2utt=ark:data/test_${test}.spk2utt"
|
||||
utt2spk_opt="--utt2spk=ark:data/test_${test}.utt2spk"
|
||||
|
||||
sgmm-gselect $model "$feats" ark,t:- 2>$dir/gselect.log | \
|
||||
gzip -c > $dir/${test}_gselect.gz || exit 1;
|
||||
gselect_opt="--gselect=ark:gunzip -c $dir/${test}_gselect.gz|"
|
||||
|
||||
# Use smaller beam first time.
|
||||
sgmm-decode-faster "$gselect_opt" --beam=15.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt $alimodel $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.pre_tra ark,t:$dir/test_${test}.pre_ali 2> $dir/predecode_${test}.log
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}.pre_ali ark:- | \
|
||||
weight-silence-post 0.01 $silphonelist $alimodel ark:- ark:- | \
|
||||
sgmm-post-to-gpost "$gselect_opt" $alimodel "$feats" ark,s,cs:- ark:- | \
|
||||
sgmm-est-spkvecs-gpost "$spk2utt_opt" $model "$feats" ark,s,cs:- \
|
||||
ark:$dir/test_${test}.vecs ) 2>$dir/vecs_${test}.log
|
||||
|
||||
|
||||
sgmm-decode-faster $utt2spk_opt --spk-vecs=ark:$dir/test_${test}.vecs --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,44 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri1
|
||||
tree=exp/tri1/tree
|
||||
model=exp/tri1/final.mdl
|
||||
graphdir=exp/graph_tri1
|
||||
|
||||
mkdir -p $dir
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,65 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# deocde_tri_fmllr.sh is as decode_tri.sh but estimating fMLLR in test,
|
||||
# per speaker. There is no SAT.
|
||||
# To be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
srcdir=exp/decode_tri1
|
||||
dir=exp/decode_tri1_fmllr
|
||||
mkdir -p $dir
|
||||
model=exp/tri1/final.mdl
|
||||
tree=exp/tri1/tree
|
||||
graphdir=exp/graph_tri1
|
||||
silphones=`cat data/silphones.csl`
|
||||
|
||||
mincount=500 # mincount before we estimate a transform.
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
# Comment the two lines below to make this per-utterance.
|
||||
# This would only work if $srcdir was also per-utterance [otherwise
|
||||
# you'd have to mess with the script a bit].
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
sifeats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
|
||||
weight-silence-post 0.01 $silphones $model ark:- ark:- | \
|
||||
gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
|
||||
"$sifeats" ark,o:- ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
|
||||
|
||||
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra > $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
||||
|
|
@ -0,0 +1,69 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# deocde_tri_regtree_fmllr.sh is as ../decode_tri.sh but estimating fMLLR in test,
|
||||
# per speaker. There is no SAT. Use a regression-tree with top-level speech/sil
|
||||
# split (no silence weighting).
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
srcdir=exp/decode_tri1
|
||||
dir=exp/decode_tri1_regtree_fmllr
|
||||
mkdir -p $dir
|
||||
model=exp/tri1/final.mdl
|
||||
occs=exp/tri1/final.occs
|
||||
tree=exp/tri1/tree
|
||||
graphdir=exp/graph_tri1
|
||||
silphones=`cat data/silphones.csl`
|
||||
|
||||
regtree=$dir/regtree
|
||||
maxleaves=8 # max # of regression-tree leaves.
|
||||
mincount=5000 # mincount before we add new transform.
|
||||
gmm-make-regtree --sil-phones=$silphones --state-occs=$occs --max-leaves=$maxleaves $model $regtree 2>$dir/make_regtree.out
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
# Comment the two lines below to make this per-utterance.
|
||||
# This would only work if $srcdir was also per-utterance [otherwise
|
||||
# you'd have to mess with the script a bit].
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
# To deweight silence, would add the line
|
||||
# weight-silence-post 0.0 $silphones $model ark:- ark:- | \
|
||||
# after the line with ali-to-post
|
||||
# This is useful if we don't treat silence specially when building regression tree.
|
||||
|
||||
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
|
||||
ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
|
||||
gmm-est-regtree-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model "$feats" ark:- $regtree ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
|
||||
|
||||
gmm-decode-faster-regtree-fmllr $utt2spk_opt --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst $regtree "$feats" ark:$dir/${test}.fmllr ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra > $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
||||
|
|
@ -0,0 +1,44 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2a
|
||||
mkdir -p $dir
|
||||
model=exp/tri2a/final.mdl
|
||||
tree=exp/tri2a/tree
|
||||
graphdir=exp/graph_tri2a
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,65 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# deocde_tri_fmllr.sh is as decode_tri.sh but estimating fMLLR in test,
|
||||
# per speaker. There is no SAT.
|
||||
# To be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
srcdir=exp/decode_tri2a
|
||||
dir=exp/decode_tri2a_fmllr
|
||||
mkdir -p $dir
|
||||
model=exp/tri2a/final.mdl
|
||||
tree=exp/tri2a/tree
|
||||
graphdir=exp/graph_tri2a
|
||||
silphones=`cat data/silphones.csl`
|
||||
|
||||
mincount=500 # mincount before we estimate a transform.
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
# Comment the two lines below to make this per-utterance.
|
||||
# This would only work if $srcdir was also per-utterance [otherwise
|
||||
# you'd have to mess with the script a bit].
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
sifeats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
|
||||
weight-silence-post 0.01 $silphones $model ark:- ark:- | \
|
||||
gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
|
||||
"$sifeats" ark,o:- ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
|
||||
|
||||
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra > $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
||||
|
|
@ -0,0 +1,65 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# deocde_tri_fmllr.sh is as decode_tri.sh but estimating fMLLR in test,
|
||||
# per speaker. There is no SAT.
|
||||
# To be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
srcdir=exp/decode_tri2a
|
||||
dir=exp/decode_tri2a_fmllr_utt
|
||||
mkdir -p $dir
|
||||
model=exp/tri2a/final.mdl
|
||||
tree=exp/tri2a/tree
|
||||
graphdir=exp/graph_tri2a
|
||||
silphones=`cat data/silphones.csl`
|
||||
|
||||
mincount=500 # mincount before we estimate a transform.
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
# Comment the two lines below to make this per-utterance.
|
||||
# This would only work if $srcdir was also per-utterance [otherwise
|
||||
# you'd have to mess with the script a bit].
|
||||
#spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
#utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
sifeats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
|
||||
weight-silence-post 0.01 $silphones $model ark:- ark:- | \
|
||||
gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
|
||||
"$sifeats" ark,o:- ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
|
||||
|
||||
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra > $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
||||
|
|
@ -0,0 +1,67 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2b
|
||||
mkdir -p $dir
|
||||
model=exp/tri2b/final.mdl
|
||||
alignmodel=exp/tri2b/final.alimdl
|
||||
et=exp/tri2b/final.et
|
||||
defaultmat=exp/tri2b/default.mat
|
||||
tree=exp/tri2b/tree
|
||||
graphdir=exp/graph_tri2b
|
||||
silphones=`cat data/silphones.csl`
|
||||
|
||||
# already made the graph.
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
defaultfeats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:-|"
|
||||
sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
# First do SI decoding with alignment model.
|
||||
# Use smaller beam for this, as less critical.
|
||||
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
|
||||
|
||||
# Comment the two lines below to make this per-utterance.
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
|
||||
gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1 $model $et \
|
||||
"$sifeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
|
||||
2>$dir/et_${test}.log || exit 1;
|
||||
|
||||
feats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,62 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# Decode the testing data.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2c
|
||||
mkdir -p $dir
|
||||
model=exp/tri2c/final.mdl
|
||||
tree=exp/tri2c/tree
|
||||
graphdir=exp/graph_tri2c
|
||||
# Note, the following 3 options must match the same options in train_tri2c.sh
|
||||
norm_vars=false
|
||||
after_deltas=false
|
||||
per_spk=true
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
if [ $per_spk == "true" ]; then
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
fi # else empty.
|
||||
|
||||
echo "Computing cepstral mean and variance stats."
|
||||
# compute mean and variance stats.
|
||||
if [ $after_deltas == true ]; then
|
||||
add-deltas --print-args=false scp:data/test_${test}.scp ark:- | compute-cmvn-stats $spk2utt_opt ark:- ark:$dir/cmvn_${test}ark 2>$dir/cmvn_${test}.log
|
||||
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | apply-cmvn --norm-vars=$norm_vars $utt2spk_opt ark:$dir/cmvn_${test}ark ark:- ark:- |"
|
||||
else
|
||||
compute-cmvn-stats --spk2utt=ark:data/test_${test}.spk2utt scp:data/test_${test}.scp ark:$dir/cmvn_${test} 2>$dir/cmvn_${test}.log
|
||||
feats="ark:apply-cmvn --norm-vars=$norm_vars $utt2spk_opt ark:$dir/cmvn_${test} scp:data/test_${test}.scp ark:- | add-deltas --print-args=false ark:- ark:- |"
|
||||
fi
|
||||
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra > $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,45 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2d
|
||||
mkdir -p $dir
|
||||
model=exp/tri2d/final.mdl
|
||||
tree=exp/tri2d/tree
|
||||
graphdir=exp/graph_tri2d
|
||||
transform=exp/tri2d/final.mat
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,45 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2e
|
||||
mkdir -p $dir
|
||||
model=exp/tri2e/final.mdl
|
||||
tree=exp/tri2e/tree
|
||||
graphdir=exp/graph_tri2e
|
||||
transform=exp/tri2e/final.mat
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,45 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2f
|
||||
mkdir -p $dir
|
||||
model=exp/tri2f/final.mdl
|
||||
tree=exp/tri2f/tree
|
||||
graphdir=exp/graph_tri2f
|
||||
transform=exp/tri2f/final.mat
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,65 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2g
|
||||
mkdir -p $dir
|
||||
model=exp/tri2g/final.mdl
|
||||
alignmodel=exp/tri2g/final.alimdl
|
||||
lvtln=exp/tri2g/final.lvtln
|
||||
tree=exp/tri2g/tree
|
||||
graphdir=exp/graph_tri2g
|
||||
silphones=`cat data/silphones.csl`
|
||||
|
||||
# already made the graph.
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
# First do SI decoding with alignment model.
|
||||
# Use smaller beam for this, as less critical.
|
||||
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
|
||||
|
||||
# Comment the two lines below to make this per-utterance.
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
|
||||
gmm-est-lvtln-trans --verbose=1 $spk2utt_opt $model $lvtln \
|
||||
"$sifeats" ark:- ark:$dir/lvtln_${test}.trans ark,t:$dir/lvtln_${test}.warp ) \
|
||||
2>$dir/lvtln_${test}.log || exit 1;
|
||||
|
||||
feats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/lvtln_${test}.trans ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,65 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2g_diag
|
||||
mkdir -p $dir
|
||||
model=exp/tri2g/final.mdl
|
||||
alignmodel=exp/tri2g/final.alimdl
|
||||
lvtln=exp/tri2g/final.lvtln
|
||||
tree=exp/tri2g/tree
|
||||
graphdir=exp/graph_tri2g
|
||||
silphones=`cat data/silphones.csl`
|
||||
|
||||
# already made the graph.
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
# First do SI decoding with alignment model.
|
||||
# Use smaller beam for this, as less critical.
|
||||
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
|
||||
|
||||
# Comment the two lines below to make this per-utterance.
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
|
||||
gmm-est-lvtln-trans --norm-type=diag --verbose=1 $spk2utt_opt $model $lvtln \
|
||||
"$sifeats" ark:- ark:$dir/lvtln_${test}.trans ark,t:$dir/lvtln_${test}.warp ) \
|
||||
2>$dir/lvtln_${test}.log || exit 1;
|
||||
|
||||
feats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/lvtln_${test}.trans ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,79 @@
|
|||
# as decode_tri2g but using the feature-level VTLN
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# as opposed to the linear VTLN when decoding.
|
||||
# Also computing a maximum-likelihood mean offset,
|
||||
# for better comparability with LVTLN.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2g_vtln
|
||||
mkdir -p $dir
|
||||
vtlnmodel=exp/tri2g/final.vtlnmdl
|
||||
lvtlnmodel=exp/tri2g/final.mdl
|
||||
alignmodel=exp/tri2g/final.alimdl
|
||||
lvtln=exp/tri2g/final.lvtln
|
||||
tree=exp/tri2g/tree
|
||||
graphdir=exp/graph_tri2g
|
||||
silphones=`cat data/silphones.csl`
|
||||
|
||||
# Doesn't matter which model we use when making the graph
|
||||
# (only the transitions and structure are used).
|
||||
scripts/mkgraph.sh $tree $vtlnmodel $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
# First do SI decoding with alignment model.
|
||||
# Use smaller beam for this, as less critical.
|
||||
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
|
||||
|
||||
# Comment the two lines below to make this per-utterance.
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
|
||||
gmm-est-lvtln-trans --verbose=1 $spk2utt_opt $lvtlnmodel $lvtln \
|
||||
"$sifeats" ark:- ark:/dev/null ark,t:$dir/lvtln_${test}.warp ) \
|
||||
2>$dir/lvtln_${test}.log || exit 1;
|
||||
|
||||
cat $dir/lvtln_${test}.warp | awk '{print $1, (0.85+0.01*$2);}' > $dir/${test}.factor
|
||||
|
||||
feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- |"
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
|
||||
gmm-est-fmllr --fmllr-update-type=offset $spk2utt_opt $vtlnmodel "$feats" ark,o:- ark:$dir/${test}.trans ) 2>$dir/fmllr_${test}.log || exit 1;
|
||||
|
||||
feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.trans ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $vtlnmodel $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,79 @@
|
|||
# as decode_tri2g but using the feature-level VTLN
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# as opposed to the linear VTLN when decoding.
|
||||
# Also computing a diagonal fMLLR transform for
|
||||
# comparison with ET.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2g_vtln_diag
|
||||
mkdir -p $dir
|
||||
vtlnmodel=exp/tri2g/final.vtlnmdl
|
||||
lvtlnmodel=exp/tri2g/final.mdl
|
||||
alignmodel=exp/tri2g/final.alimdl
|
||||
lvtln=exp/tri2g/final.lvtln
|
||||
tree=exp/tri2g/tree
|
||||
graphdir=exp/graph_tri2g
|
||||
silphones=`cat data/silphones.csl`
|
||||
|
||||
# Doesn't matter which model we use when making the graph
|
||||
# (only the transitions and structure are used).
|
||||
scripts/mkgraph.sh $tree $vtlnmodel $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
# First do SI decoding with alignment model.
|
||||
# Use smaller beam for this, as less critical.
|
||||
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
|
||||
|
||||
# Comment the two lines below to make this per-utterance.
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
|
||||
gmm-est-lvtln-trans --verbose=1 $spk2utt_opt $lvtlnmodel $lvtln \
|
||||
"$sifeats" ark:- ark:/dev/null ark,t:$dir/lvtln_${test}.warp ) \
|
||||
2>$dir/lvtln_${test}.log || exit 1;
|
||||
|
||||
cat $dir/lvtln_${test}.warp | awk '{print $1, (0.85+0.01*$2);}' > $dir/${test}.factor
|
||||
|
||||
feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- |"
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
|
||||
gmm-est-fmllr --fmllr-update-type=diag $spk2utt_opt $vtlnmodel "$feats" ark,o:- ark:$dir/${test}.trans ) 2>$dir/fmllr_${test}.log || exit 1;
|
||||
|
||||
feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.trans ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $vtlnmodel $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,71 @@
|
|||
# as decode_tri2g but using the feature-level VTLN
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# as opposed to the linear VTLN when decoding.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2g_vtln_nofmllr
|
||||
mkdir -p $dir
|
||||
vtlnmodel=exp/tri2g/final.vtlnmdl
|
||||
lvtlnmodel=exp/tri2g/final.mdl
|
||||
alignmodel=exp/tri2g/final.alimdl
|
||||
lvtln=exp/tri2g/final.lvtln
|
||||
tree=exp/tri2g/tree
|
||||
graphdir=exp/graph_tri2g
|
||||
silphones=`cat data/silphones.csl`
|
||||
|
||||
# Doesn't matter which model we use when making the graph
|
||||
# (only the transitions and structure are used).
|
||||
scripts/mkgraph.sh $tree $vtlnmodel $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
|
||||
|
||||
# First do SI decoding with alignment model.
|
||||
# Use smaller beam for this, as less critical.
|
||||
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
|
||||
|
||||
# Comment the two lines below to make this per-utterance.
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
|
||||
gmm-est-lvtln-trans --verbose=1 $spk2utt_opt $lvtlnmodel $lvtln \
|
||||
"$sifeats" ark:- ark:/dev/null ark,t:$dir/lvtln_${test}.warp ) \
|
||||
2>$dir/lvtln_${test}.log || exit 1;
|
||||
|
||||
cat $dir/lvtln_${test}.warp | awk '{print $1, (0.85+0.01*$2);}' > $dir/${test}.factor
|
||||
|
||||
feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $vtlnmodel $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,45 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2h
|
||||
mkdir -p $dir
|
||||
model=exp/tri2h/final.mdl
|
||||
tree=exp/tri2h/tree
|
||||
graphdir=exp/graph_tri2h
|
||||
transform=exp/tri2h/final.mat
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,45 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2i
|
||||
mkdir -p $dir
|
||||
model=exp/tri2i/final.mdl
|
||||
tree=exp/tri2i/tree
|
||||
graphdir=exp/graph_tri2i
|
||||
transform=exp/tri2i/final.mat
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
feats="ark:add-deltas --delta-order=3 scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,44 @@
|
|||
# to be run from ..
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2j
|
||||
mkdir -p $dir
|
||||
model=exp/tri2j/final.mdl
|
||||
tree=exp/tri2j/tree
|
||||
graphdir=exp/graph_tri2j
|
||||
transform=exp/tri2j/final.mat
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
feats="ark:add-deltas --delta-order=3 scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,68 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2k
|
||||
mkdir -p $dir
|
||||
model=exp/tri2k/final.mdl
|
||||
alignmodel=exp/tri2k/final.alimdl
|
||||
et=exp/tri2k/final.et
|
||||
tree=exp/tri2k/tree
|
||||
graphdir=exp/graph_tri2k
|
||||
ldamat=exp/tri2k/lda.mat
|
||||
defaultmat=exp/tri2k/default.mat
|
||||
silphones=`cat data/silphones.csl`
|
||||
|
||||
# already made the graph.
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
defaultfeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:- |"
|
||||
sifeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- |"
|
||||
|
||||
# First do SI decoding with alignment model.
|
||||
# Use smaller beam for this, as less critical.
|
||||
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
|
||||
|
||||
# Comment the two lines below to make this per-utterance.
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
|
||||
gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1 $model $et \
|
||||
"$sifeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
|
||||
2>$dir/et_${test}.log || exit 1;
|
||||
|
||||
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output (cut off in mid-line) without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,77 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2k_fmllr
|
||||
mkdir -p $dir
|
||||
model=exp/tri2k/final.mdl
|
||||
alignmodel=exp/tri2k/final.alimdl
|
||||
et=exp/tri2k/final.et
|
||||
tree=exp/tri2k/tree
|
||||
graphdir=exp/graph_tri2k
|
||||
ldamat=exp/tri2k/lda.mat
|
||||
defaultmat=exp/tri2k/default.mat
|
||||
silphones=`cat data/silphones.csl`
|
||||
|
||||
# already made the graph.
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
defaultfeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:- |"
|
||||
basefeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- |"
|
||||
|
||||
# First do SI decoding with alignment model.
|
||||
# Use smaller beam for this, as less critical.
|
||||
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pass1.tra ark,t:$dir/test_${test}_pass1.ali 2> $dir/pass1decode_${test}.log
|
||||
|
||||
# Comment the two lines below to make this per-utterance.
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}_pass1.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
|
||||
gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1 $model $et \
|
||||
"$basefeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
|
||||
2>$dir/et_${test}.log || exit 1;
|
||||
|
||||
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}_pass2.tra ark,t:$dir/test_${test}_pass2.ali 2> $dir/pass2decode_${test}.log
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}_pass2.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $model ark:- ark:- | \
|
||||
gmm-est-fmllr $spk2utt_opt $model "$feats" ark:- ark:$dir/fmllr_${test}.trans ) \
|
||||
2>$dir/fmllr_${test}.log || exit 1;
|
||||
|
||||
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/fmllr_${test}.trans ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,80 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2k_regtree_fmllr
|
||||
mkdir -p $dir
|
||||
model=exp/tri2k/final.mdl
|
||||
alignmodel=exp/tri2k/final.alimdl
|
||||
et=exp/tri2k/final.et
|
||||
tree=exp/tri2k/tree
|
||||
graphdir=exp/graph_tri2k
|
||||
ldamat=exp/tri2k/lda.mat
|
||||
defaultmat=exp/tri2k/default.mat
|
||||
silphones=`cat data/silphones.csl`
|
||||
|
||||
occs=exp/tri2k/final.occs
|
||||
regtree=$dir/regtree
|
||||
maxleaves=8 # max # of regression-tree leaves.
|
||||
mincount=5000 # mincount before we add new transform.
|
||||
gmm-make-regtree --sil-phones=$silphones --state-occs=$occs --max-leaves=$maxleaves $model $regtree 2>$dir/make_regtree.out
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
defaultfeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:- |"
|
||||
basefeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- |"
|
||||
|
||||
# First do SI decoding with alignment model.
|
||||
# Use smaller beam for this, as less critical.
|
||||
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pass1.tra ark,t:$dir/test_${test}_pass1.ali 2> $dir/pass1decode_${test}.log
|
||||
|
||||
# Comment the two lines below to make this per-utterance.
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}_pass1.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
|
||||
gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1 $model $et \
|
||||
"$basefeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
|
||||
2>$dir/et_${test}.log || exit 1;
|
||||
|
||||
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}_pass2.tra ark,t:$dir/test_${test}_pass2.ali 2> $dir/pass2decode_${test}.log
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}_pass2.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $model ark:- ark:- | \
|
||||
gmm-est-regtree-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model "$feats" ark:- $regtree ark:$dir/${test}.fmllr ) \
|
||||
2>$dir/fmllr_${test}.log || exit 1;
|
||||
|
||||
gmm-decode-faster-regtree-fmllr $utt2spk_opt --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst $regtree "$feats" ark:$dir/${test}.fmllr ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,68 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2k_utt
|
||||
mkdir -p $dir
|
||||
model=exp/tri2k/final.mdl
|
||||
alignmodel=exp/tri2k/final.alimdl
|
||||
et=exp/tri2k/final.et
|
||||
tree=exp/tri2k/tree
|
||||
graphdir=exp/graph_tri2k
|
||||
ldamat=exp/tri2k/lda.mat
|
||||
defaultmat=exp/tri2k/default.mat
|
||||
silphones=`cat data/silphones.csl`
|
||||
|
||||
# already made the graph.
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
defaultfeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:- |"
|
||||
sifeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- |"
|
||||
|
||||
# First do SI decoding with alignment model.
|
||||
# Use smaller beam for this, as less critical.
|
||||
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
|
||||
|
||||
# Comment the two lines below to make this per-utterance.
|
||||
#spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
#utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
|
||||
gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1 $model $et \
|
||||
"$sifeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
|
||||
2>$dir/et_${test}.log || exit 1;
|
||||
|
||||
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output (cut off in mid-line) without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,61 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2l
|
||||
mkdir -p $dir
|
||||
model=exp/tri2l/final.mdl
|
||||
alignmodel=exp/tri2l/final.alimdl
|
||||
tree=exp/tri2l/tree
|
||||
graphdir=exp/graph_tri2l
|
||||
transform=exp/tri2l/final.mat
|
||||
silphones=`cat data/silphones.csl`
|
||||
mincount=500
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
sifeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
|
||||
|
||||
# Use smaller beam for 1st pass.
|
||||
gmm-decode-faster --beam=17.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}.pre_tra ark,t:$dir/test_${test}.pre_ali 2> $dir/predecode_${test}.log
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}.pre_ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $model ark:- ark:- | \
|
||||
gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
|
||||
"$sifeats" ark,o:- ark:$dir/${test}.fmllr ) 2>$dir/fmllr_${test}.log
|
||||
|
||||
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,61 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# to be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/decode_tri2l_utt
|
||||
mkdir -p $dir
|
||||
model=exp/tri2l/final.mdl
|
||||
alignmodel=exp/tri2l/final.alimdl
|
||||
tree=exp/tri2l/tree
|
||||
graphdir=exp/graph_tri2l
|
||||
transform=exp/tri2l/final.mat
|
||||
silphones=`cat data/silphones.csl`
|
||||
mincount=300
|
||||
|
||||
scripts/mkgraph.sh $tree $model $graphdir
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
(
|
||||
#spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
|
||||
#utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
|
||||
|
||||
sifeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
|
||||
|
||||
# Use smaller beam for 1st pass.
|
||||
gmm-decode-faster --beam=17.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}.pre_tra ark,t:$dir/test_${test}.pre_ali 2> $dir/predecode_${test}.log
|
||||
|
||||
( ali-to-post ark:$dir/test_${test}.pre_ali ark:- | \
|
||||
weight-silence-post 0.0 $silphones $model ark:- ark:- | \
|
||||
gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
|
||||
"$sifeats" ark,o:- ark:$dir/${test}.fmllr ) 2>$dir/fmllr_${test}.log
|
||||
|
||||
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
|
||||
|
||||
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
|
||||
|
||||
# the ,p option lets it score partial output without dying..
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
|
||||
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
cat $dir/wer_* | \
|
||||
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
|
||||
> $dir/wer
|
|
@ -0,0 +1,32 @@
|
|||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# Decode the testing data.
|
||||
|
||||
# this is the hardest test set, see
|
||||
# http://www.itl.nist.gov/iad/mig/tests/rt/ASRhistory/pdf/resource_management_92eval.pdf
|
||||
|
||||
dir=exp/decode_tri_mixup
|
||||
mkdir -p $dir
|
||||
srcdir=exp/tri_mixup
|
||||
model=$srcdir/25.mdl
|
||||
graphdir=exp/graph_tri_mixup
|
||||
|
||||
|
||||
../src/bin/faster-decode-gmm --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst data/test_sep92.scp $dir/word_transcripts.txt $dir/alignments.txt > $dir/decode.out
|
||||
|
||||
../src/bin/compute-wer --symbol-table=data/words.txt data_prep/test_sep92_trans.txt $dir/word_transcripts.txt > $dir/wer
|
||||
|
|
@ -0,0 +1,48 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# Initialize SGMM from a trained HMM/GMM system.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
|
||||
dir=exp/sgmm/init
|
||||
mkdir -p $dir
|
||||
srcdir=exp/tri1
|
||||
model=exp/sgmm/0.mdl
|
||||
|
||||
init-ubm --intermediate-numcomps=2000 --ubm-numcomps=400 --verbose=2 \
|
||||
--fullcov-ubm=true $srcdir/final.mdl $srcdir/final.occs \
|
||||
$dir/ubm0 2> $dir/cluster.log
|
||||
|
||||
|
||||
subset[0]=1000
|
||||
subset[1]=1500
|
||||
subset[2]=2000
|
||||
subset[3]=2500
|
||||
|
||||
for x in 0 1 2 3; do
|
||||
echo "Pass $x"
|
||||
feats="ark:scripts/subset_scp.pl ${subset[$x]} data/train.scp | add-deltas --print-args=false scp:- ark:- |"
|
||||
fgmm-global-acc-stats --diag-gmm-nbest=15 --binary=false --verbose=2 $dir/ubm$x "$feats" $dir/$x.acc \
|
||||
2> $dir/acc.$x.log || exit 1;
|
||||
fgmm-global-est --verbose=2 $dir/ubm$x $dir/$x.acc \
|
||||
$dir/ubm$[$x+1] 2> $dir/update.$x.log || exit 1;
|
||||
rm $dir/$x.acc
|
||||
done
|
||||
|
||||
sgmm-init $srcdir/final.mdl $dir/ubm4 $model 2> $dir/sgmm_init.log
|
||||
|
|
@ -0,0 +1,44 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# To be run from .. (one directory up from here)
|
||||
|
||||
if [ $# != 1 ]; then
|
||||
echo "usage: make_mfcc_test.sh <abs-path-to-tmpdir>"
|
||||
exit 1;
|
||||
fi
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
|
||||
dir=exp/make_mfcc
|
||||
mkdir -p $dir
|
||||
root_out=$1
|
||||
mkdir -p $root_out
|
||||
|
||||
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
|
||||
scpin=data_prep/test_${test}_wav.scp
|
||||
# Making it like this so it works for others on the BUT filesystem.
|
||||
# It will generate the correct scp file without running the feature extraction.
|
||||
log=$dir/make_mfcc_test_${test}.log
|
||||
(
|
||||
compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf scp:$scpin ark,scp:$root_out/test_${test}_raw_mfcc.ark,$root_out/test_${test}_raw_mfcc.scp 2> $log || tail $log
|
||||
cp $root_out/test_${test}_raw_mfcc.scp data/test_${test}.scp
|
||||
) &
|
||||
done
|
||||
|
||||
wait
|
||||
|
||||
echo "If the above produced no output on the screen, it succeeded."
|
|
@ -0,0 +1,43 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# To be run from .. (one directory up from here)
|
||||
|
||||
if [ $# != 1 ]; then
|
||||
echo "usage: make_mfcc_train.sh <abs-path-to-tmpdir>";
|
||||
exit 1;
|
||||
fi
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
|
||||
scpin=data_prep/train_wav.scp
|
||||
dir=exp/make_mfcc
|
||||
mkdir -p $dir
|
||||
root_out=$1
|
||||
mkdir -p $root_out
|
||||
|
||||
scripts/split_scp.pl $scpin $dir/train_wav{1,2,3,4}.scp
|
||||
|
||||
for n in 1 2 3 4; do # Use 4 CPUs
|
||||
log=$dir/make_mfcc_train.$n.log
|
||||
compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf scp:$dir/train_wav${n}.scp ark,scp:$root_out/train_raw_mfcc${n}.ark,$root_out/train_raw_mfcc${n}.scp 2> $log || tail $log &
|
||||
done
|
||||
|
||||
wait;
|
||||
|
||||
cat $root_out/train_raw_mfcc{1,2,3,4}.scp > data/train.scp
|
||||
|
||||
echo "If the above produced no output on the screen, it succeeded."
|
|
@ -0,0 +1,66 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# The output of this script is the symbol tables data/{words.txt,phones.txt},
|
||||
# and the grammars and lexicons data/{L,G}{,_disambig}.fst
|
||||
|
||||
# To be run from ..
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
|
||||
cp data_prep/G.txt data/
|
||||
scripts/make_words_symtab.pl < data/G.txt > data/words.txt
|
||||
cp data_prep/lexicon.txt data/
|
||||
|
||||
|
||||
scripts/make_phones_symtab.pl < data/lexicon.txt > data/phones.txt
|
||||
|
||||
silphones="sil"; # This would in general be a space-separated list of all silence phones. E.g. "sil vn"
|
||||
# Generate colon-separated lists of silence and non-silence phones.
|
||||
scripts/silphones.pl data/phones.txt "$silphones" data/silphones.csl data/nonsilphones.csl
|
||||
|
||||
ndisambig=`scripts/add_lex_disambig.pl data/lexicon.txt data/lexicon_disambig.txt`
|
||||
scripts/add_disambig.pl data/phones.txt $ndisambig > data/phones_disambig.txt
|
||||
|
||||
|
||||
# Create train transcripts in integer format:
|
||||
cat data_prep/train_trans.txt | \
|
||||
scripts/sym2int.pl --ignore-first-field data/words.txt > data/train.tra
|
||||
|
||||
|
||||
# Get lexicon in FST format.
|
||||
|
||||
# silprob = 0.5: same prob as word.
|
||||
scripts/make_lexicon_fst.pl data/lexicon.txt 0.5 sil | fstcompile --isymbols=data/phones.txt --osymbols=data/words.txt --keep_isymbols=false --keep_osymbols=false | fstarcsort --sort_type=olabel > data/L.fst
|
||||
|
||||
scripts/make_lexicon_fst.pl data/lexicon_disambig.txt 0.5 sil | fstcompile --isymbols=data/phones_disambig.txt --osymbols=data/words.txt --keep_isymbols=false --keep_osymbols=false | fstarcsort --sort_type=olabel > data/L_disambig.fst
|
||||
|
||||
fstcompile --isymbols=data/words.txt --osymbols=data/words.txt --keep_isymbols=false --keep_osymbols=false data/G.txt > data/G.fst
|
||||
|
||||
# Checking that G is stochastic [note, it wouldn't be for an Arpa]
|
||||
fstisstochastic data/G.fst || echo Error
|
||||
|
||||
|
||||
# Checking that disambiguated lexicon times G is determinizable
|
||||
fsttablecompose data/L_disambig.fst data/G.fst | fstdeterminize >/dev/null || echo Error
|
||||
|
||||
# Checking that LG is stochastic:
|
||||
fsttablecompose data/L.fst data/G.fst | fstisstochastic || echo Error
|
||||
|
||||
## Check lexicon.
|
||||
## just have a look and make sure it seems sane.
|
||||
fstprint --isymbols=data/phones.txt --osymbols=data/words.txt data/L.fst | head
|
||||
|
|
@ -0,0 +1,121 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# To be run from ..
|
||||
|
||||
# train_et2.sh is as train_et.sh but using an adapt model with
|
||||
# fewer Gaussians. Seeing if this makes the warp distribution more
|
||||
# bimodal.
|
||||
|
||||
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
srcdir=exp/adapt2
|
||||
dir=exp/et2
|
||||
srcmodel=$srcdir/20.mdl
|
||||
|
||||
normtype=mean # could be mean or none or mean-and-var
|
||||
|
||||
spk2utt_opt=--spk2utt=ark:$dir/spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:$dir/utt2spk
|
||||
# for per-utterance, uncomment the following [this would make it worse]:
|
||||
# spk2utt_opt=
|
||||
# utt2spk_opt=
|
||||
feats="ark:add-deltas scp:$dir/train.scp ark:- |"
|
||||
|
||||
mkdir -p $dir
|
||||
|
||||
nspk=109 # Use all 109 RM training speakers.
|
||||
nutt=15 # Use at most 15 utterances from each speaker.
|
||||
|
||||
head -$nspk data/train.spk2utt | \
|
||||
awk '{ printf("%s ",$1); for(x=2; x<=NF&&x<='$nutt'+1;x++)
|
||||
{ printf("%s ", $x); } printf("\n"); }' > $dir/spk2utt
|
||||
|
||||
scripts/spk2utt_to_utt2spk.pl < $dir/spk2utt > $dir/utt2spk
|
||||
cat $dir/utt2spk | awk '{print $1}' > $dir/uttlist
|
||||
scripts/filter_scp.pl $dir/uttlist <data/train.scp >$dir/train.scp
|
||||
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
|
||||
cp $srcdir/tree $dir
|
||||
cp $srcdir/phone_map $dir
|
||||
|
||||
# Use a subset of a training utts from srcdir, so we use the alignments from there:
|
||||
# link these.
|
||||
(
|
||||
cd $dir
|
||||
ln -s ../../$srcdir/cur.ali .
|
||||
ln -s ../../$srcmodel 0.mdl
|
||||
)
|
||||
|
||||
# Init the transform:
|
||||
|
||||
gmm-init-et --normalize-type=$normtype --binary=false --dim=39 $dir/0.et 2>$dir/init_et.log || exit 1
|
||||
|
||||
|
||||
for x in 0 1 2 3 4 5 6 7 8 9 10 11; do
|
||||
x1=$[$x+1];
|
||||
|
||||
# Work out current transforms:
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $srcmodel "$feats" ark:- ark:- | \
|
||||
gmm-est-et $spk2utt_opt --verbose=1 $dir/$x.mdl $dir/$x.et "$feats" ark:- ark:$dir/$x.trans ark,t:$dir/$x.warp ) 2> $dir/trans.$x.log || exit 1;
|
||||
|
||||
# Accumulate stats to update model:
|
||||
( transform-feats $utt2spk_opt ark:$dir/$x.trans "$feats" ark:- 2>$dir/apply_fmllr.$x.log | \
|
||||
gmm-acc-stats-twofeats $srcmodel "$feats" ark:- "ark:cat $dir/cur.ali | ali-to-post ark:- ark:- |" $dir/$x.acc ) 2>$dir/gmm_acc.$x.log || exit 1;
|
||||
|
||||
|
||||
# Check likelihoods (must add the fMLLR determinants from apply_fmllr.$x.log, to get meaningful
|
||||
# figures.)
|
||||
( transform-feats $utt2spk_opt ark:$dir/$x.trans "$feats" ark:- | \
|
||||
gmm-acc-stats $dir/$x.mdl ark:- "ark:cat $dir/cur.ali | ali-to-post ark:- ark:- |" /dev/null ) 2>$dir/gmm_getlike.$x.log || exit 1;
|
||||
|
||||
|
||||
gmm-est --remove-low-count-gaussians=false $dir/$x.mdl $dir/$x.acc $dir/$x1.mdl 2>$dir/gmm_est.$x.log || exit 1;
|
||||
|
||||
# Next estimate either A or B, depending on iteration:
|
||||
if [ $[$x%2] == 0 ]; then # Estimate A:
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $srcmodel "$feats" ark:- ark:- | \
|
||||
gmm-et-acc-a $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$feats" ark:- $dir/$x.et_acc_a ) 2> $dir/acc_a.$x.log || exit 1;
|
||||
gmm-et-est-a --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.et_acc_a 2> $dir/update_a.$x.log || exit 1;
|
||||
rm $dir/$x.et_acc_a
|
||||
else
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $srcmodel "$feats" ark:- ark:- | \
|
||||
gmm-et-acc-b $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$feats" ark:- ark:$dir/$x.trans ark:$dir/$x.warp $dir/$x.et_acc_b 2> $dir/acc_b.$x.log || exit 1;
|
||||
gmm-et-est-b --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.mat $dir/$x.et_acc_b ) 2> $dir/update_b.$x.log || exit 1;
|
||||
rm $dir/$x.et_acc_b
|
||||
# Careful!: gmm-transform-means here changes $x1.mdl in-place.
|
||||
gmm-transform-means $dir/$x.mat $dir/$x1.mdl $dir/$x1.mdl 2> $dir/transform_means.$x.log
|
||||
fi
|
||||
rm $dir/$x.trans
|
||||
if [ $x != 0 ]; then
|
||||
rm $dir/$x.mdl # keep 0.mdl as it's the alignment model.
|
||||
fi
|
||||
rm $dir/$x.acc
|
||||
x=$[$x+1];
|
||||
done
|
||||
|
||||
for n in 0 1 2 3 4 5 6 7 8 9 10 11; do
|
||||
cat $dir/$n.warp | scripts/process_warps.pl data/spk2gender.map > $dir/warps.$n
|
||||
done
|
|
@ -0,0 +1,92 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# To be run from ..
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
|
||||
# Train the monophone on a subset-- no point using all the data.
|
||||
dir=exp/mono
|
||||
n=1000
|
||||
feats="ark:add-deltas --print-args=false scp:$dir/train.scp ark:- |"
|
||||
# need to quote when passing as an argument, as in "$feats",
|
||||
# since it has spaces in it.
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
|
||||
numiters=30 # Number of iterations of training
|
||||
maxiterinc=20 # Last iter to increase #Gauss on.
|
||||
numgauss=250 # Initial num-Gauss (must be more than #states=3*phones).
|
||||
totgauss=1000 # Target #Gaussians.
|
||||
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
|
||||
realign_iters="1 2 3 4 5 6 7 8 9 10 12 15 20 25";
|
||||
|
||||
|
||||
mkdir -p $dir
|
||||
scripts/subset_scp.pl $n data/train.scp > $dir/train.scp
|
||||
|
||||
|
||||
silphones=`cat data/silphones.csl | sed 's/:/ /g'`
|
||||
nonsilphones=`cat data/nonsilphones.csl | sed 's/:/ /g'`
|
||||
cat conf/topo.proto | sed "s:NONSILENCEPHONES:$nonsilphones:" | sed "s:SILENCEPHONES:$silphones:" > $dir/topo
|
||||
|
||||
gmm-init-mono '--train-feats=ark:head -10 data/train.scp | add-deltas scp:- ark:- |' $dir/topo 39 $dir/0.mdl $dir/tree 2> $dir/init.out || exit 1;
|
||||
|
||||
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/0.mdl data/L.fst \
|
||||
"ark:scripts/subset_scp.pl $n data/train.tra|" \
|
||||
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
|
||||
|
||||
echo Pass 0
|
||||
|
||||
align-equal-compiled "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
|
||||
ark,t,f:- 2>$dir/align.0.log | \
|
||||
gmm-acc-stats-ali --binary=true $dir/0.mdl "$feats" ark:- \
|
||||
$dir/0.acc 2> $dir/acc.0.log || exit 1;
|
||||
|
||||
# In the following steps, the --min-gaussian-occupancy=3 option is important, otherwise
|
||||
# we fail to est "rare" phones and later on, they never align properly.
|
||||
gmm-est --min-gaussian-occupancy=3 --mix-up=$numgauss \
|
||||
$dir/0.mdl $dir/0.acc $dir/1.mdl 2> $dir/update.0.log || exit 1;
|
||||
|
||||
rm $dir/0.acc
|
||||
|
||||
|
||||
beam=4 # will change to 8 below after 1st pass
|
||||
x=1
|
||||
while [ $x -lt $numiters ]; do
|
||||
echo "Pass $x"
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=$beam --retry-beam=$[$beam*4] $dir/$x.mdl \
|
||||
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" t,ark:$dir/cur.ali \
|
||||
2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
|
||||
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
|
||||
rm $dir/$x.mdl $dir/$x.acc
|
||||
if [ $x -le $maxiterinc ]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
beam=8
|
||||
x=$[$x+1]
|
||||
done
|
||||
|
||||
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl )
|
||||
|
||||
# example of showing the alignments:
|
||||
# show-alignments data/phones.txt $dir/30.mdl ark:$dir/cur.ali | head -4
|
||||
|
|
@ -0,0 +1,87 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
|
||||
# To be run from ..
|
||||
|
||||
dir=exp/sgmm
|
||||
srcdir=exp/tri1
|
||||
srcmodel=$srcdir/final.mdl
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
|
||||
numiters=25 # Total number of iterations
|
||||
|
||||
realign_iters="5 10 15";
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
numsubstates=1500 # Initial #-substates.
|
||||
totsubstates=5000 # Target #-substates.
|
||||
maxiterinc=15 # Last iter to increase #substates on.
|
||||
incsubstates=$[($totsubstates-$numsubstates)/$maxiterinc] # per-iter increment for #substates
|
||||
gselect_opt="--gselect=ark:gunzip -c $dir/gselect.gz|"
|
||||
randprune=0.1
|
||||
mkdir -p $dir
|
||||
|
||||
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
|
||||
cp $srcdir/tree $dir
|
||||
|
||||
echo "aligning all training data"
|
||||
if [ ! -f $dir/0.ali ]; then
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel "$srcgraphs" \
|
||||
"$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
fi
|
||||
|
||||
if [ ! -f $dir/0.mdl ]; then
|
||||
echo "you must run init_sgmm.sh before train_sgmm1.sh"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ ! -f $dir/gselect.gz ]; then
|
||||
sgmm-gselect $dir/0.mdl "$feats" ark,t:- 2>$dir/gselect.log | gzip -c > $dir/gselect.gz || exit 1;
|
||||
fi
|
||||
|
||||
cp $dir/0.ali $dir/cur.ali || exit 1;
|
||||
|
||||
iter=0
|
||||
while [ $iter -lt $numiters ]; do
|
||||
echo "Pass $iter ... "
|
||||
if echo $realign_iters | grep -w $iter >/dev/null; then
|
||||
echo "Aligning data"
|
||||
sgmm-align-compiled $scale_opts "$gselect_opt" --beam=8 --retry-beam=40 $dir/$iter.mdl \
|
||||
"$srcgraphs" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$iter.log || exit 1;
|
||||
fi
|
||||
if [ $iter -gt 0 ]; then
|
||||
flags=vMwcS
|
||||
else
|
||||
flags=vwcS
|
||||
fi
|
||||
if [ ! -f $dir/$[$iter+1].mdl ]; then
|
||||
sgmm-acc-stats-ali --update-flags=$flags "$gselect_opt" --rand-prune=$randprune --binary=false $dir/$iter.mdl "$feats" ark:$dir/cur.ali $dir/$iter.acc 2> $dir/acc.$iter.log || exit 1;
|
||||
sgmm-est --update-flags=$flags --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
|
||||
fi
|
||||
# rm $dir/$iter.mdl $dir/$iter.acc
|
||||
# rm $dir/$iter.occs
|
||||
if [ $iter -lt $maxiterinc ]; then
|
||||
numsubstates=$[$numsubstates+$incsubstates]
|
||||
fi
|
||||
iter=$[$iter+1];
|
||||
done
|
||||
|
||||
( cd $dir; rm final.mdl final.occs 2>/dev/null; ln -s $iter.mdl final.mdl; ln -s $iter.occs final.occs )
|
|
@ -0,0 +1,103 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# This is SGMM training with speaker vectors.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
|
||||
# To be run from ..
|
||||
|
||||
dir=exp/sgmm2
|
||||
srcdir=exp/sgmm
|
||||
gmmtridir=exp/tri1
|
||||
trimodel=$gmmtridir/final.mdl
|
||||
srcgraphs="ark:gunzip -c $gmmtridir/graphs.fsts.gz|"
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
|
||||
numiters=25 # Total number of iterations
|
||||
|
||||
realign_iters="5 10 15";
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
numsubstates=1500 # Initial #-substates.
|
||||
totsubstates=5000 # Target #-substates.
|
||||
maxiterinc=15 # Last iter to increase #substates on.
|
||||
incsubstates=$[($totsubstates-$numsubstates)/$maxiterinc] # per-iter increment for #substates
|
||||
gselect_opt="--gselect=ark:gunzip -c $dir/gselect.gz|"
|
||||
randprune=0.1
|
||||
spkdim=39
|
||||
mkdir -p $dir
|
||||
|
||||
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
|
||||
cp $gmmtridir/tree $srcdir/{0.ali,0.mdl,gselect.gz} $dir
|
||||
|
||||
if [ ! -f $dir/0.ali ]; then
|
||||
echo "aligning all training data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $trimodel "$srcgraphs" \
|
||||
"$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
fi
|
||||
|
||||
if [ ! -f $dir/0.mdl ]; then
|
||||
echo "you must run init_sgmm.sh before train_sgmm2.sh"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ ! -f $dir/gselect.gz ]; then
|
||||
sgmm-gselect $dir/0.mdl "$feats" ark,t:- 2>$dir/gselect.log | gzip -c > $dir/gselect.gz || exit 1;
|
||||
fi
|
||||
|
||||
cp $dir/0.ali $dir/cur.ali || exit 1;
|
||||
|
||||
iter=0
|
||||
while [ $iter -lt $numiters ]; do
|
||||
echo "Pass $iter ... "
|
||||
if [ $iter -gt 0 ]; then
|
||||
if [ $iter -le 5 ]; then # only train phonetic subspace
|
||||
flags=vMwcS
|
||||
elif [ $(( $iter % 2 )) -eq 1 ]; then # odd iterations
|
||||
flags=vMwcS
|
||||
else # even iterations, update N and not M
|
||||
flags=vwcSN
|
||||
fi
|
||||
else
|
||||
flags=vwcS
|
||||
fi
|
||||
|
||||
if [ ! -f $dir/$[$iter+1].mdl ]; then
|
||||
if echo $realign_iters | grep -w $iter >/dev/null; then
|
||||
echo "Aligning data"
|
||||
sgmm-align-compiled $scale_opts "$gselect_opt" --beam=8 --retry-beam=40 $dir/$iter.mdl \
|
||||
"$srcgraphs" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$iter.log || exit 1;
|
||||
fi
|
||||
sgmm-acc-stats-ali --update-flags=$flags "$gselect_opt" --rand-prune=$randprune --binary=false $dir/$iter.mdl "$feats" ark:$dir/cur.ali $dir/$iter.acc 2> $dir/acc.$iter.log || exit 1;
|
||||
if [ $iter -eq 5 ]; then # increase spk dimension from 0 to 39
|
||||
sgmm-estimate --update-flags=$flags --increase-spk-dim=$spkdim --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
|
||||
else
|
||||
sgmm-estimate --update-flags=$flags --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
|
||||
fi
|
||||
fi
|
||||
|
||||
rm $dir/$iter.acc # $dir/$iter.mdl
|
||||
# rm $dir/$iter.occs
|
||||
if [ $iter -lt $maxiterinc ]; then
|
||||
numsubstates=$[$numsubstates+$incsubstates]
|
||||
fi
|
||||
iter=$[$iter+1];
|
||||
done
|
||||
|
||||
( cd $dir; rm final.mdl final.occs 2>/dev/null; ln -s $iter.mdl final.mdl; ln -s $iter.occs final.occs )
|
|
@ -0,0 +1,88 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
|
||||
dir=exp/sgmma
|
||||
ubm=exp/ubma/4.ubm
|
||||
srcdir=exp/tri1
|
||||
srcmodel=$srcdir/final.mdl
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
|
||||
numiters=25 # Total number of iterations
|
||||
|
||||
realign_iters="5 10 15";
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
numsubstates=1500 # Initial #-substates.
|
||||
totsubstates=5000 # Target #-substates.
|
||||
maxiterinc=15 # Last iter to increase #substates on.
|
||||
incsubstates=$[($totsubstates-$numsubstates)/$maxiterinc] # per-iter increment for #substates
|
||||
gselect_opt="--gselect=ark:gunzip -c $dir/gselect.gz|"
|
||||
randprune=0.1
|
||||
mkdir -p $dir
|
||||
|
||||
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
|
||||
cp $srcdir/tree $dir
|
||||
|
||||
if [ ! -f $ubm ]; then
|
||||
echo "No UBM in $ubm"
|
||||
fi
|
||||
sgmm-init $srcdir/final.mdl $ubm $dir/0.mdl 2> $dir/sgmm_init.log
|
||||
|
||||
echo "aligning all training data"
|
||||
if [ ! -f $dir/0.ali ]; then
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel "$srcgraphs" \
|
||||
"$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
fi
|
||||
|
||||
if [ ! -f $dir/gselect.gz ]; then
|
||||
sgmm-gselect $dir/0.mdl "$feats" ark,t:- 2>$dir/gselect.log | gzip -c > $dir/gselect.gz || exit 1;
|
||||
fi
|
||||
|
||||
cp $dir/0.ali $dir/cur.ali || exit 1;
|
||||
|
||||
iter=0
|
||||
while [ $iter -lt $numiters ]; do
|
||||
echo "Pass $iter ... "
|
||||
if echo $realign_iters | grep -w $iter >/dev/null; then
|
||||
echo "Aligning data"
|
||||
echo "Aligning data"
|
||||
sgmm-align-compiled $spkvecs_opt $scale_opts "$gselect_opt" --beam=8 \
|
||||
--retry-beam=40 $dir/$iter.mdl "$srcgraphs" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$iter.log || exit 1;
|
||||
fi
|
||||
if [ $iter -gt 0 ]; then
|
||||
flags=vMwcS
|
||||
else
|
||||
flags=vwcS
|
||||
fi
|
||||
if [ ! -f $dir/$[$iter+1].mdl ]; then
|
||||
sgmm-acc-stats-ali --update-flags=$flags "$gselect_opt" --rand-prune=$randprune --binary=false $dir/$iter.mdl "$feats" ark:$dir/cur.ali $dir/$iter.acc 2> $dir/acc.$iter.log || exit 1;
|
||||
sgmm-est --update-flags=$flags --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
|
||||
fi
|
||||
# TEMP: will restore these statements later.
|
||||
# rm $dir/$iter.mdl $dir/$iter.acc
|
||||
# rm $dir/$iter.occs
|
||||
if [ $iter -lt $maxiterinc ]; then
|
||||
numsubstates=$[$numsubstates+$incsubstates]
|
||||
fi
|
||||
iter=$[$iter+1];
|
||||
done
|
||||
|
||||
( cd $dir; rm final.mdl final.occs 2>/dev/null; ln -s $iter.mdl final.mdl; ln -s $iter.occs final.occs )
|
|
@ -0,0 +1,131 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
|
||||
# To be run from ..
|
||||
# You must run init_sgmma.sh first.
|
||||
# We rely on the initial model exp/sgmma/0.mdl being there
|
||||
|
||||
dir=exp/sgmmb
|
||||
srcdir=exp/tri1
|
||||
srcmodel=$srcdir/final.mdl
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
|
||||
numiters=25 # Total number of iterations
|
||||
|
||||
ubm=exp/ubma/4.ubm
|
||||
realign_iters="5 10 15";
|
||||
spkvec_iters="5 8 12 17 22"
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
numsubstates=1500 # Initial #-substates.
|
||||
totsubstates=5000 # Target #-substates.
|
||||
maxiterinc=15 # Last iter to increase #substates on.
|
||||
incsubstates=$[($totsubstates-$numsubstates)/$maxiterinc] # per-iter increment for #substates
|
||||
gselect_opt="--gselect=ark:gunzip -c $dir/gselect.gz|"
|
||||
# Initially don't have speaker vectors, but change this after
|
||||
# we estimate them.
|
||||
spkvecs_opt=
|
||||
randprune=0.1
|
||||
mkdir -p $dir
|
||||
|
||||
utt2spk_opt="--utt2spk=ark:data/train.utt2spk"
|
||||
spk2utt_opt="--spk2utt=ark:data/train.spk2utt"
|
||||
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
|
||||
if [ ! -f $ubm ]; then
|
||||
echo "No UBM in $ubm"
|
||||
fi
|
||||
|
||||
sgmm-init --spk-space-dim=39 $srcdir/final.mdl $ubm $dir/0.mdl 2> $dir/sgmm_init.log || exit 1;
|
||||
|
||||
cp $srcdir/tree $dir
|
||||
|
||||
echo "aligning all training data"
|
||||
if [ ! -f $dir/0.ali ]; then
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel "$srcgraphs" \
|
||||
"$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
fi
|
||||
|
||||
if [ ! -f $dir/0.mdl ]; then
|
||||
echo "you must run init_sgmm.sh before train_sgmm1.sh"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ ! -f $dir/gselect.gz ]; then
|
||||
sgmm-gselect $dir/0.mdl "$feats" ark,t:- 2>$dir/gselect.log | gzip -c > $dir/gselect.gz || exit 1;
|
||||
fi
|
||||
|
||||
cp $dir/0.ali $dir/cur.ali || exit 1;
|
||||
|
||||
iter=0
|
||||
while [ $iter -lt $numiters ]; do
|
||||
echo "Pass $iter ... "
|
||||
if echo $realign_iters | grep -w $iter >/dev/null; then
|
||||
echo "Aligning data"
|
||||
sgmm-align-compiled $spkvecs_opt $utt2spk_opt $scale_opts "$gselect_opt" \
|
||||
--beam=8 --retry-beam=40 $dir/$iter.mdl "$srcgraphs" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$iter.log || exit 1;
|
||||
fi
|
||||
if echo $spkvec_iters | grep -w $iter >/dev/null; then
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.01 $silphonelist $dir/$iter.mdl ark:- ark:- | \
|
||||
sgmm-est-spkvecs $spk2utt_opt $spkvecs_opt "$gselect_opt" \
|
||||
--rand-prune=$randprune $dir/$iter.mdl \
|
||||
"$feats" ark:- ark:$dir/cur.vecs 2>$dir/spkvecs.$iter.log ) || exit 1;
|
||||
spkvecs_opt="--spk-vecs=ark:$dir/cur.vecs"
|
||||
fi
|
||||
if [ $iter -eq 0 ]; then
|
||||
flags=vwcS
|
||||
elif [ $[$iter%2] -eq 1 -a $iter -gt 4 ]; then # even iters after 4...
|
||||
flags=vNwcS
|
||||
else
|
||||
flags=vMwcS
|
||||
fi
|
||||
if [ ! -f $dir/$[$iter+1].mdl ]; then
|
||||
sgmm-acc-stats $spkvecs_opt $utt2spk_opt --update-flags=$flags "$gselect_opt" --rand-prune=$randprune --binary=false $dir/$iter.mdl "$feats" "ark:ali-to-post ark:$dir/cur.ali ark:-|" $dir/$iter.acc 2> $dir/acc.$iter.log || exit 1;
|
||||
sgmm-est --update-flags=$flags --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
|
||||
fi
|
||||
rm $dir/$iter.mdl $dir/$iter.acc
|
||||
rm $dir/$iter.occs
|
||||
if [ $iter -lt $maxiterinc ]; then
|
||||
numsubstates=$[$numsubstates+$incsubstates]
|
||||
fi
|
||||
iter=$[$iter+1];
|
||||
done
|
||||
|
||||
|
||||
# The point of this last phase of accumulation is to get Gaussian-level
|
||||
# alignments with the speaker vectors but accumulate stats without
|
||||
# any speaker vectors; we re-estimate M, w, c and S to get a model
|
||||
# that's compatible with not having speaker vectors.
|
||||
|
||||
|
||||
flags=MwcS
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
sgmm-post-to-gpost $spkvecs_opt $utt2spk_opt "$gselect_opt" \
|
||||
$dir/$iter.mdl "$feats" ark,s,cs:- ark:- | \
|
||||
sgmm-acc-stats-gpost --update-flags=$flags $dir/$iter.mdl "$feats" \
|
||||
ark,s,cs:- $dir/$iter.aliacc ) 2> $dir/acc_ali.$iter.log || exit 1;
|
||||
sgmm-est --update-flags=$flags --remove-speaker-space=true $dir/$iter.mdl \
|
||||
$dir/$iter.aliacc $dir/$iter.alimdl 2>$dir/update_ali.$iter.log || exit 1;
|
||||
|
||||
|
||||
( cd $dir; rm final.mdl final.occs 2>/dev/null;
|
||||
ln -s $iter.mdl final.mdl; ln -s $iter.alimdl final.alimdl;
|
||||
ln -s $iter.occs final.occs )
|
|
@ -0,0 +1,109 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
|
||||
# To be run from ..
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/tri1
|
||||
srcdir=exp/mono
|
||||
srcmodel=$srcdir/final.mdl
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
realign_iters="5 10 15 20";
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
|
||||
numiters=25 # Number of iterations of training
|
||||
maxiterinc=15 # Last iter to increase #Gauss on.
|
||||
numleaves=1500
|
||||
numgauss=$[$numleaves + $numleaves/2];
|
||||
# Initially mix up to avg. 1.5 Gauss/state ( a bit more
|
||||
# than this, due to state clustering... then slowly mix
|
||||
# up to final amount.
|
||||
totgauss=7000 # Target #Gaussians
|
||||
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
|
||||
|
||||
|
||||
mkdir $dir
|
||||
cp $srcdir/topo $dir
|
||||
|
||||
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
|
||||
# Align all training data using old model. Since we have more data for this pass,
|
||||
# we use the version of gmm-align that compiles the graphs itself.
|
||||
|
||||
echo "aligning all training data"
|
||||
gmm-align $scale_opts --beam=8 --retry-beam=40 $srcdir/tree $srcmodel data/L.fst \
|
||||
"$feats" ark:data/train.tra ark:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
|
||||
|
||||
|
||||
|
||||
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
|
||||
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
|
||||
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
|
||||
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
|
||||
|
||||
# Have to make silence root not-shared because we will not split it.
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
|
||||
|
||||
build-tree --verbose=1 --max-leaves=$numleaves \
|
||||
$dir/treeacc $dir/roots.txt \
|
||||
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
|
||||
|
||||
gmm-init-model --write-occs=$dir/1.occs \
|
||||
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
|
||||
|
||||
gmm-mixup --mix-up=$numgauss $dir/1.mdl $dir/1.occs $dir/1.mdl \
|
||||
2>$dir/mixup.log || exit 1;
|
||||
|
||||
rm $dir/treeacc
|
||||
|
||||
# Convert alignments generated from monophone model, to use as initial alignments.
|
||||
|
||||
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
|
||||
# Debug step only: convert back and check they're the same.
|
||||
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark:- \
|
||||
2>/dev/null | cmp - $dir/0.ali || exit 1;
|
||||
|
||||
rm $dir/0.ali
|
||||
|
||||
# Make training graphs
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
|
||||
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1;
|
||||
|
||||
x=1
|
||||
while [ $x -lt $numiters ]; do
|
||||
echo pass $x
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
|
||||
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
|
||||
gmm-est --write-occs=$dir/$[$x+1].occs --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
|
||||
rm $dir/$x.mdl $dir/$x.acc
|
||||
rm $dir/$x.occs
|
||||
if [[ $x -le $maxiterinc ]]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
x=$[$x+1];
|
||||
done
|
||||
|
||||
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl; ln -s $x.occs final.occs )
|
|
@ -0,0 +1,101 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# To be run from ..
|
||||
|
||||
# This (train_tri2a) is a basic triphone training starting from tri1/,
|
||||
# to serve as a baseline for the other train_tri2? scripts.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/tri2a
|
||||
srcdir=exp/tri1
|
||||
srcmodel=$srcdir/final.mdl
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
numiters=30 # Number of iterations of training
|
||||
maxiterinc=15 # Last iter to increase #Gauss on.
|
||||
numleaves=1500
|
||||
numgauss=$[$numleaves*2]; # Initially mix up to avg. 2 Gauss/state.
|
||||
# Then slowly mix up to final amount.
|
||||
totgauss=7000 # Target #Gaussians
|
||||
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
|
||||
realign_iters="10 15 20"; # Because last model was reasonable, don't
|
||||
# realign too soon (i.e., on 5th iter).
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
|
||||
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
|
||||
mkdir -p $dir
|
||||
cp $srcdir/topo $dir
|
||||
|
||||
echo "aligning all training data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel "$srcgraphs" \
|
||||
"$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
|
||||
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali \
|
||||
$dir/treeacc 2> $dir/acc.tree.log || exit 1;
|
||||
|
||||
|
||||
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
|
||||
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
|
||||
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
|
||||
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
|
||||
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
|
||||
build-tree --verbose=1 --max-leaves=$numleaves \
|
||||
$dir/treeacc $dir/roots.txt \
|
||||
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
|
||||
|
||||
gmm-init-model --write-occs=$dir/1.occs \
|
||||
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
|
||||
|
||||
rm $dir/treeacc
|
||||
|
||||
# Convert alignments generated from previous model, to use as initial alignments.
|
||||
|
||||
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
|
||||
# Debug step only: convert back and check they're the same.
|
||||
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
|
||||
2>/dev/null | cmp - $dir/0.ali || exit 1;
|
||||
|
||||
rm $dir/0.ali
|
||||
|
||||
# Make training graphs
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
|
||||
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
|
||||
|
||||
x=1
|
||||
while [ $x -lt $numiters ]; do
|
||||
echo pass $x
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
|
||||
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
|
||||
rm $dir/$x.mdl $dir/$x.acc
|
||||
if [ $x -le $maxiterinc ]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
x=$[$x+1]
|
||||
done
|
||||
|
||||
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl )
|
||||
|
|
@ -0,0 +1,191 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# To be run from ..
|
||||
|
||||
# This (train_tri2b.sh) is training the exponential transform,
|
||||
# on top of standard double-delta features.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/tri2b
|
||||
srcdir=exp/tri1
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
srcmodel=$srcdir/final.mdl
|
||||
dim=39
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
# The spk2utt_opt uses a subset of utterances that we create; this is only
|
||||
# needed by programs that use the subset.
|
||||
spk2utt_opt=--spk2utt=ark:$dir/spk2utt
|
||||
# the utt2spk opt is used by programs that use all the data so give
|
||||
# it the original utt2spk file.
|
||||
utt2spk_opt=--utt2spk=ark:data/train.utt2spk
|
||||
normtype=mean # et option; could be mean, or none
|
||||
|
||||
numiters=30 # Number of iterations of training
|
||||
maxiterinc=20 # Last iter to increase #Gauss on.
|
||||
numiters_et=15 # Before this, update et.
|
||||
numleaves=1500
|
||||
numgauss=$numleaves
|
||||
totgauss=7000 # Target #Gaussians
|
||||
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
|
||||
realign_iters="10 15 20 25";
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
|
||||
nutt=15 # Use at most 15 utterances from each speaker for
|
||||
# estimating transforms, and A and B (will use all the data
|
||||
# for estimating the model though, so be careful: we're
|
||||
# not always using the lists in $dir).
|
||||
|
||||
mkdir -p $dir
|
||||
cp $srcdir/topo $dir
|
||||
|
||||
|
||||
awk '{ printf("%s ",$1); for(x=2; x<=NF&&x<='$nutt'+1;x++)
|
||||
{ printf("%s ", $x); } printf("\n"); }' <data/train.spk2utt >$dir/spk2utt
|
||||
scripts/spk2utt_to_utt2spk.pl < $dir/spk2utt > $dir/utt2spk
|
||||
cat $dir/utt2spk | awk '{print $1}' > $dir/uttlist
|
||||
scripts/filter_scp.pl $dir/uttlist <data/train.scp >$dir/train.scp
|
||||
|
||||
|
||||
origfeats="ark,s,cs:add-deltas scp:data/train.scp ark:- |"
|
||||
# The following two variables will get changed in the script.
|
||||
feats="$origfeats"
|
||||
|
||||
|
||||
echo "aligning all training data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel "$srcgraphs" \
|
||||
"$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
|
||||
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali \
|
||||
$dir/treeacc 2> $dir/acc.tree.log || exit 1;
|
||||
|
||||
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
|
||||
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
|
||||
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
|
||||
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
|
||||
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
|
||||
build-tree --verbose=1 --max-leaves=$numleaves \
|
||||
$dir/treeacc $dir/roots.txt \
|
||||
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
|
||||
|
||||
gmm-init-model --write-occs=$dir/1.occs \
|
||||
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
|
||||
|
||||
# Convert alignments generated from previous model, to use as initial alignments.
|
||||
|
||||
rm $dir/treeacc
|
||||
|
||||
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali
|
||||
2>$dir/convert.log || exit 1
|
||||
|
||||
rm $dir/0.ali
|
||||
|
||||
# Make training graphs
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
|
||||
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
|
||||
|
||||
gmm-init-et --normalize-type=$normtype --binary=false --dim=$dim $dir/1.et 2>$dir/init_et.log || exit 1
|
||||
|
||||
x=1
|
||||
while [ $x -lt $numiters ]; do
|
||||
x1=$[$x+1];
|
||||
echo pass $x
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
|
||||
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
|
||||
if [ $x -lt $numiters_et ]; then
|
||||
# Work out current transforms:
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
|
||||
gmm-post-to-gpost $dir/$x.mdl "$feats" ark:- ark:- | \
|
||||
gmm-est-et $spk2utt_opt --verbose=1 $dir/$x.mdl $dir/$x.et "$origfeats" \
|
||||
ark,s,cs:- ark:$dir/$x.trans ark,t:$dir/$x.warp ) 2> $dir/trans.$x.log || exit 1;
|
||||
|
||||
# Remove previous transforms, if present.
|
||||
if [ $x -gt 1 ]; then rm $dir/$[$x-1].trans; fi
|
||||
|
||||
# Now change $feats to correspond to the transformed features.
|
||||
feats="ark:add-deltas scp:data/train.scp ark:- | transform-feats $utt2spk_opt ark:$dir/$x.trans ark:- ark:- |"
|
||||
fi
|
||||
|
||||
# Accumulate stats to update model:
|
||||
gmm-acc-stats-ali $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2>$dir/gmm_acc.$x.log || exit 1;
|
||||
# Update model.
|
||||
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$x1.mdl \
|
||||
2>$dir/gmm_est.$x.log || exit 1;
|
||||
|
||||
rm $dir/$x.acc $dir/$x.mdl
|
||||
|
||||
|
||||
if [ $x -lt $numiters_et ]; then
|
||||
# Alternately estimate either A or B.
|
||||
if [ $[$x%2] == 0 ]; then # Estimate A:
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $dir/$x1.mdl ark:- ark:- | \
|
||||
gmm-post-to-gpost $dir/$x1.mdl "$feats" ark:- ark:- | \
|
||||
gmm-et-acc-a $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$origfeats" ark,s,cs:- $dir/$x.et_acc_a ) 2> $dir/acc_a.$x.log || exit 1;
|
||||
gmm-et-est-a --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.et_acc_a 2> $dir/update_a.$x.log || exit 1;
|
||||
rm $dir/$x.et_acc_a
|
||||
else
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $dir/$x1.mdl ark:- ark:- | \
|
||||
gmm-post-to-gpost $dir/$x1.mdl "$feats" ark:- ark:- | \
|
||||
gmm-et-acc-b $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$origfeats" ark,s,cs:- ark:$dir/$x.trans ark:$dir/$x.warp $dir/$x.et_acc_b ) 2> $dir/acc_b.$x.log || exit 1;
|
||||
gmm-et-est-b --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.mat $dir/$x.et_acc_b 2> $dir/update_b.$x.log || exit 1;
|
||||
rm $dir/$x.et_acc_b
|
||||
# Careful!: gmm-transform-means here changes $x1.mdl in-place.
|
||||
gmm-transform-means $dir/$x.mat $dir/$x1.mdl $dir/$x1.mdl 2> $dir/transform_means.$x.log
|
||||
fi
|
||||
fi
|
||||
if [ $x -le $maxiterinc ]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
x=$[$x+1];
|
||||
done
|
||||
|
||||
|
||||
# Accumulate stats for "alignment model" which is as the model but with
|
||||
# the baseline features (shares Gaussian-level alignments).
|
||||
|
||||
gmm-et-get-b $dir/$numiters_et.et $dir/default.mat 2>$dir/get_b.log || exit 1
|
||||
|
||||
defaultfeats="ark,s,cs:add-deltas scp:data/train.scp ark:- | transform-feats $dir/default.mat ark:- ark:- |"
|
||||
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$defaultfeats" ark:- $dir/$x.acc2 ) 2>$dir/acc_alimdl.log || exit 1;
|
||||
# Update model.
|
||||
gmm-est --remove-low-count-gaussians=false $dir/$x.mdl $dir/$x.acc2 $dir/$x.alimdl \
|
||||
2>$dir/est_alimdl.log || exit 1;
|
||||
rm $dir/$x.acc2
|
||||
|
||||
|
||||
# The following files may be useful for display purposes.
|
||||
for n in 1 2 3 4 5 6 7 8 9 10 11 12 13 14; do
|
||||
cat $dir/$n.warp | scripts/process_warps.pl data/spk2gender.map > $dir/warps.$n
|
||||
done
|
||||
|
||||
( cd $dir; rm final.mdl 2>/dev/null;
|
||||
ln -s $x.mdl final.mdl; ln -s $x.alimdl final.alimdl;
|
||||
ln -s $numiters_et.et final.et )
|
||||
|
|
@ -0,0 +1,123 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# To be run from ..
|
||||
|
||||
# This (train_tri2c) is training with mean normalization (you could
|
||||
# modify options in this script to do variance normalization).
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/tri2c
|
||||
srcdir=exp/tri1
|
||||
srcmodel=$srcdir/final.mdl
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
mkdir -p $dir
|
||||
cp $srcdir/topo $dir
|
||||
norm_vars=false
|
||||
after_deltas=false
|
||||
per_spk=true
|
||||
|
||||
numiters=30 # Number of iterations of training
|
||||
maxiterinc=20 # Last iter to increase #Gauss on.
|
||||
numiters_et=15 # Before this, update et.
|
||||
numleaves=1500
|
||||
numgauss=$numleaves
|
||||
totgauss=7000 # Target #Gaussians
|
||||
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
|
||||
realign_iters="10 15 20 25";
|
||||
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
|
||||
if [ $per_spk == "true" ]; then
|
||||
spk2utt_opt=--spk2utt=ark:data/train.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/train.utt2spk
|
||||
else
|
||||
spk2utt_opt=
|
||||
utt2spk_opt=
|
||||
fi
|
||||
|
||||
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
|
||||
echo "Computing cepstral mean and variance stats."
|
||||
# compute mean and variance stats.
|
||||
if [ $after_deltas == true ]; then
|
||||
compute-cmvn-stats $spk2utt_opt "$srcfeats" ark:$dir/cmvn.ark 2>$dir/cmvn.log
|
||||
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- | apply-cmvn --norm-vars=$norm_vars $utt2spk_opt ark:$dir/cmvn.ark ark:- ark:- |"
|
||||
else
|
||||
compute-cmvn-stats --spk2utt=ark:data/train.spk2utt scp:data/train.scp \
|
||||
ark:$dir/cmvn.ark 2>$dir/cmvn.log
|
||||
feats="ark:apply-cmvn --norm-vars=$norm_vars $utt2spk_opt ark:$dir/cmvn.ark scp:data/train.scp ark:- | add-deltas --print-args=false ark:- ark:- |"
|
||||
fi
|
||||
|
||||
|
||||
echo "aligning all training data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
|
||||
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
|
||||
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
|
||||
|
||||
|
||||
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
|
||||
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
|
||||
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
|
||||
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
|
||||
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
|
||||
build-tree --verbose=1 --max-leaves=$numleaves \
|
||||
$dir/treeacc $dir/roots.txt \
|
||||
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
|
||||
|
||||
gmm-init-model --write-occs=$dir/1.occs \
|
||||
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
|
||||
|
||||
|
||||
# Convert alignments generated from previous model, to use as initial alignments.
|
||||
|
||||
rm $dir/treeacc
|
||||
|
||||
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
|
||||
# Debug step only: convert back and check they're the same.
|
||||
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
|
||||
2>/dev/null | cmp - $dir/0.ali || exit 1;
|
||||
|
||||
rm $dir/0.ali
|
||||
|
||||
# Make training graphs
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra ark:$dir/graphs.fsts \
|
||||
2>$dir/compile_graphs.log || exit 1
|
||||
|
||||
x=1
|
||||
while [ $x -lt $numiters ]; do
|
||||
echo pass $x
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl ark:$dir/graphs.fsts "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
|
||||
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
|
||||
rm $dir/$x.mdl $dir/$x.acc
|
||||
if [ $x -le $maxiterinc ]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
x=$[$x+1];
|
||||
done
|
||||
|
||||
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl )
|
|
@ -0,0 +1,120 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# To be run from ..
|
||||
|
||||
# This (train_tri2d) is training with standard delta+delta-delta features
|
||||
# plus MLLT.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/tri2d
|
||||
srcdir=exp/tri1
|
||||
srcmodel=$srcdir/final.mdl
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
|
||||
numiters=30 # Number of iterations of training
|
||||
maxiterinc=20 # Last iter to increase #Gauss on.
|
||||
numleaves=1500
|
||||
numgauss=$numleaves
|
||||
totgauss=7000 # Target #Gaussians
|
||||
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
realign_iters="10 15 20 25";
|
||||
mllt_iters="2 4 6 12";
|
||||
|
||||
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
# Subset of features used to train MLLT transform.
|
||||
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --print-args=false scp:- ark:- |"
|
||||
|
||||
mkdir -p $dir
|
||||
cp $srcdir/topo $dir
|
||||
|
||||
echo "aligning all training data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
|
||||
"$srcgraphs" "$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
|
||||
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
|
||||
|
||||
|
||||
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
|
||||
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
|
||||
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
|
||||
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
|
||||
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
|
||||
build-tree --verbose=1 --max-leaves=$numleaves \
|
||||
$dir/treeacc $dir/roots.txt \
|
||||
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
|
||||
|
||||
gmm-init-model --write-occs=$dir/1.occs \
|
||||
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
|
||||
|
||||
rm $dir/treeacc
|
||||
|
||||
# Convert alignments generated from monophone model, to use as initial alignments.
|
||||
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
|
||||
# Debug step only: convert back and check they're the same.
|
||||
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
|
||||
2>/dev/null | cmp - $dir/0.ali || exit 1;
|
||||
|
||||
rm $dir/0.ali
|
||||
|
||||
# Make training graphs
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
|
||||
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
|
||||
|
||||
cur_mllt=""
|
||||
x=1
|
||||
while [ $x -lt $numiters ]; do
|
||||
echo pass $x
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
|
||||
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
if echo $mllt_iters | grep -w $x >/dev/null; then # Do MLLT update.
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
|
||||
gmm-acc-mllt --binary=false $dir/$x.mdl "$featsub" ark:- $dir/$x.macc ) 2> $dir/macc.$x.log || exit 1;
|
||||
if [ "$cur_mllt" != "" ]; then
|
||||
est-mllt $dir/$x.mat.new $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
|
||||
gmm-transform-means --binary=false $dir/$x.mat.new $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
|
||||
compose-transforms --print-args=false $dir/$x.mat.new $cur_mllt $dir/$x.mat || exit 1;
|
||||
else
|
||||
est-mllt $dir/$x.mat $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
|
||||
gmm-transform-means --binary=false $dir/$x.mat $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
|
||||
fi
|
||||
cur_mllt=$dir/$x.mat
|
||||
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- | transform-feats $cur_mllt ark:- ark:- |"
|
||||
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --print-args=false scp:- ark:- | transform-feats $cur_mllt ark:- ark:- |"
|
||||
else # do GMM update.
|
||||
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
|
||||
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
|
||||
fi
|
||||
rm $dir/$x.mdl $dir/$x.acc
|
||||
if [ $x -le $maxiterinc ]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
x=$[$x+1]
|
||||
done
|
||||
|
||||
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
|
||||
ln -s `basename $cur_mllt` final.mat )
|
|
@ -0,0 +1,111 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# To be run from ..
|
||||
|
||||
# This (train_tri2e) is training with splice-9-frames+LDA features.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/tri2e
|
||||
srcdir=exp/tri1
|
||||
srcmodel=$srcdir/final.mdl
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
numiters=30 # Number of iterations of training
|
||||
maxiterinc=20 # Last iter to increase #Gauss on.
|
||||
numleaves=1500
|
||||
numgauss=$numleaves
|
||||
totgauss=7000 # Target #Gaussians
|
||||
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
realign_iters="10 15 20 25";
|
||||
|
||||
# feats corresponding to orignal model
|
||||
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/lda.mat ark:- ark:-|"
|
||||
# Subset of features used to train LDA transforms.
|
||||
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $dir/lda.mat ark:- ark:-|"
|
||||
|
||||
mkdir -p $dir
|
||||
cp $srcdir/topo $dir
|
||||
|
||||
echo "aligning all training data"
|
||||
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
|
||||
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
|
||||
(ali-to-post ark:$dir/0.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
|
||||
acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
|
||||
ark:- $dir/lda.acc ) 2>$dir/lda_acc.log || exit 1
|
||||
est-lda $dir/lda.mat $dir/lda.acc 2>$dir/lda_est.log || exit 1
|
||||
|
||||
|
||||
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
|
||||
|
||||
|
||||
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
|
||||
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
|
||||
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
|
||||
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
|
||||
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
|
||||
build-tree --verbose=1 --max-leaves=$numleaves \
|
||||
$dir/treeacc $dir/roots.txt \
|
||||
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
|
||||
|
||||
gmm-init-model --write-occs=$dir/1.occs \
|
||||
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
|
||||
|
||||
rm $dir/treeacc
|
||||
|
||||
# Convert alignments generated from monophone model, to use as initial alignments.
|
||||
|
||||
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
|
||||
# Debug step only: convert back and check they're the same.
|
||||
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
|
||||
2>/dev/null | cmp - $dir/0.ali || exit 1;
|
||||
|
||||
rm $dir/0.ali
|
||||
|
||||
# Make training graphs
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
|
||||
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
|
||||
|
||||
x=1
|
||||
while [ $x -lt $numiters ]; do
|
||||
echo pass $x
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
|
||||
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
|
||||
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
|
||||
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
|
||||
rm $dir/$x.mdl $dir/$x.acc
|
||||
if [ $x -le $maxiterinc ]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
x=$[$x+1]
|
||||
done
|
||||
|
||||
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
|
||||
ln -s lda.mat final.mat )
|
|
@ -0,0 +1,130 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# To be run from ..
|
||||
|
||||
# This (train_tri2f) is training with splice-9-frames+LDA features,
|
||||
# plus MLLT.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/tri2f
|
||||
srcdir=exp/tri1
|
||||
srcmodel=$srcdir/final.mdl
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
numiters=30 # Number of iterations of training
|
||||
maxiterinc=20 # Last iter to increase #Gauss on.
|
||||
numleaves=1500
|
||||
numgauss=$numleaves
|
||||
totgauss=7000 # Target #Gaussians
|
||||
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
realign_iters="10 15 20 25";
|
||||
mllt_iters="2 4 6 12";
|
||||
|
||||
# feats corresponding to orignal model
|
||||
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
|
||||
# Subset of features used to train LDA and MLLT transforms.
|
||||
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $dir/0.mat ark:- ark:-|"
|
||||
|
||||
mkdir -p $dir
|
||||
cp $srcdir/topo $dir
|
||||
|
||||
echo "aligning all training data"
|
||||
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
|
||||
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
|
||||
( ali-to-post ark:$dir/0.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
|
||||
acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
|
||||
ark:- $dir/lda.acc ) 2>$dir/lda_acc.log
|
||||
est-lda $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
|
||||
|
||||
|
||||
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
|
||||
|
||||
|
||||
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
|
||||
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
|
||||
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
|
||||
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
|
||||
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
|
||||
build-tree --verbose=1 --max-leaves=$numleaves \
|
||||
$dir/treeacc $dir/roots.txt \
|
||||
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
|
||||
|
||||
gmm-init-model --write-occs=$dir/1.occs \
|
||||
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
|
||||
|
||||
rm $dir/treeacc
|
||||
|
||||
# Convert alignments generated from monophone model, to use as initial alignments.
|
||||
|
||||
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
|
||||
# Debug step only: convert back and check they're the same.
|
||||
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
|
||||
2>/dev/null | cmp - $dir/0.ali || exit 1;
|
||||
|
||||
rm $dir/0.ali
|
||||
|
||||
# Make training graphs
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
|
||||
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
|
||||
|
||||
cur_lda=$dir/0.mat
|
||||
x=1
|
||||
while [ $x -lt $numiters ]; do
|
||||
echo pass $x
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
|
||||
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
if echo $mllt_iters | grep -w $x >/dev/null; then # Do MLLT update.
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
|
||||
gmm-acc-mllt --binary=false $dir/$x.mdl "$featsub" ark:- $dir/$x.macc ) 2> $dir/macc.$x.log || exit 1;
|
||||
|
||||
est-mllt $dir/$x.mat.new $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
|
||||
gmm-transform-means --binary=false $dir/$x.mat.new $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
|
||||
compose-transforms --print-args=false $dir/$x.mat.new $cur_lda $dir/$x.mat || exit 1;
|
||||
cur_lda=$dir/$x.mat
|
||||
|
||||
|
||||
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
|
||||
# Subset of features used to train MLLT transforms.
|
||||
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $cur_lda ark:- ark:-|"
|
||||
else # do GMM update.
|
||||
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
|
||||
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
|
||||
fi
|
||||
rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
|
||||
if [ $x -le $maxiterinc ]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
x=$[$x+1]
|
||||
done
|
||||
|
||||
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
|
||||
ln -s `basename $cur_lda` final.mat )
|
||||
|
|
@ -0,0 +1,205 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# To be run from ..
|
||||
|
||||
# This (train_tri2g) is training with linear-VTLN (lvtln)
|
||||
# which a linear approximation to VTLN.
|
||||
# At the end, it also converts this in a single-pass retraining
|
||||
# manner to a normal feature-level VTLN model.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/tri2g
|
||||
srcdir=exp/tri1
|
||||
srcmodel=$srcdir/final.mdl
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
numiters=30 # Number of iterations of training
|
||||
maxiterinc=20 # Last iter to increase #Gauss on.
|
||||
numleaves=1500
|
||||
numgauss=$numleaves
|
||||
totgauss=7000 # Target #Gaussians
|
||||
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
realign_iters="10 15 20 25";
|
||||
lvtln_iters="2 4 6 8 12"; # Recompute LVTLN transforms on these iters.
|
||||
per_spk=true
|
||||
compute_vtlnmdl=true # If true, at the end compute a model with actual feature-space
|
||||
# VTLN features. You can decode with this as an alternative to
|
||||
# final.mdl which takes the LVTLN features.
|
||||
|
||||
numfiles=40 # Number of feature files for computing LVTLN transforms.
|
||||
numclass=31; # Can't really change this without changing the script below
|
||||
defaultclass=15; # Corresponds to no warping.
|
||||
# RE "vtln_warp"
|
||||
|
||||
|
||||
if [ $per_spk == "true" ]; then
|
||||
spk2utt_opt=--spk2utt=ark:data/train.spk2utt
|
||||
utt2spk_opt=--utt2spk=ark:data/train.utt2spk
|
||||
else
|
||||
spk2utt_opt=
|
||||
utt2spk_opt=
|
||||
fi
|
||||
|
||||
mkdir -p $dir
|
||||
cp $srcdir/topo $dir
|
||||
|
||||
|
||||
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
# Will create lvtln.trans below...
|
||||
feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- | transform-feats $utt2spk_opt ark:$dir/cur.trans ark:- ark:- |"
|
||||
|
||||
gmm-init-lvtln --dim=39 --num-classes=$numclass --default-class=$defaultclass \
|
||||
$dir/0.lvtln 2>$dir/init_lvtln.log || exit 1
|
||||
|
||||
featsub="ark:scripts/subset_scp.pl $numfiles data/train.scp | add-deltas scp:- ark:- |"
|
||||
|
||||
echo "Initializing lvtln transforms."
|
||||
c=0
|
||||
while [ $c -lt $numclass ]; do
|
||||
warp=`perl -e 'print 0.85 + 0.01*$ARGV[0];' $c`
|
||||
featsub_warp="ark:scripts/subset_scp.pl $numfiles data_prep/train_wav.scp | compute-mfcc-feats --vtln-low=100 --vtln-high=-600 --vtln-warp=$warp --config=conf/mfcc.conf scp:- ark:- | add-deltas ark:- ark:- |"
|
||||
gmm-train-lvtln-special --normalize-var=true $c $dir/0.lvtln $dir/0.lvtln \
|
||||
"$featsub" "$featsub_warp" 2> $dir/train_special.$c.log || exit 1;
|
||||
c=$[$c+1]
|
||||
done
|
||||
|
||||
|
||||
|
||||
# just a single element. :-separated integer list of context-independent
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
# script below tells it not to cluster, but here we avoid accumulating
|
||||
# CD-stats for silence.
|
||||
|
||||
echo "aligning all training data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
|
||||
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
|
||||
|
||||
echo "Computing LVTLN transforms (iter 0)"
|
||||
( ali-to-post ark:$dir/0.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $srcmodel "$srcfeats" ark:- ark:- | \
|
||||
gmm-est-lvtln-trans --verbose=1 $spk2utt_opt $srcmodel $dir/0.lvtln \
|
||||
"$srcfeats" ark:- ark:$dir/cur.trans ark,t:$dir/0.warp ) 2>$dir/lvtln.0.log || exit 1
|
||||
|
||||
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
|
||||
|
||||
|
||||
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
|
||||
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
|
||||
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
|
||||
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
|
||||
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
|
||||
build-tree --verbose=1 --max-leaves=$numleaves \
|
||||
$dir/treeacc $dir/roots.txt \
|
||||
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
|
||||
|
||||
gmm-init-model --write-occs=$dir/1.occs \
|
||||
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
|
||||
|
||||
rm $dir/treeacc
|
||||
|
||||
# Convert alignments generated from monophone model, to use as initial alignments.
|
||||
|
||||
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
|
||||
# Debug step only: convert back and check they're the same.
|
||||
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
|
||||
2>/dev/null | cmp - $dir/0.ali || exit 1;
|
||||
|
||||
rm $dir/0.ali
|
||||
|
||||
|
||||
# Make training graphs
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
|
||||
"ark:|gzip -c > $dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
|
||||
|
||||
cur_lvtln=$dir/0.lvtln
|
||||
x=1
|
||||
while [ $x -lt $numiters ]; do
|
||||
echo pass $x
|
||||
if echo $lvtln_iters | grep -w $x >/dev/null; then
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
|
||||
gmm-post-to-gpost $dir/$x.mdl "$feats" ark:- ark:- | \
|
||||
gmm-est-lvtln-trans --verbose=1 $spk2utt_opt $dir/$x.mdl $dir/0.lvtln \
|
||||
"$srcfeats" ark:- ark:$dir/tmp.trans ark,t:$dir/$x.warp ) 2>$dir/lvtln.$x.log || exit 1
|
||||
cp $dir/$x.warp $dir/cur.warp
|
||||
mv $dir/tmp.trans $dir/cur.trans
|
||||
fi
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
|
||||
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
|
||||
gmm-est --write-occs=$dir/$[$x+1].occs --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
|
||||
rm $dir/$x.mdl $dir/$x.acc
|
||||
if [ $x -le $maxiterinc ]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
x=$[$x+1]
|
||||
done
|
||||
|
||||
# Accumulate stats for "alignment model" which is as the model but with
|
||||
# the baseline features (shares Gaussian-level alignments).
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$srcfeats" ark:- $dir/$x.acc2 ) 2>$dir/acc_alimdl.log || exit 1;
|
||||
# Update model.
|
||||
gmm-est --remove-low-count-gaussians=false $dir/$x.mdl $dir/$x.acc2 $dir/$x.alimdl \
|
||||
2>$dir/est_alimdl.log || exit 1;
|
||||
rm $dir/$x.acc2
|
||||
|
||||
|
||||
# The following files contains information that may be useful for display purposes
|
||||
|
||||
for n in 0 $lvtln_iters; do
|
||||
cat $dir/$n.warp | scripts/process_warps.pl data/spk2gender.map > $dir/warps.$n
|
||||
done
|
||||
|
||||
if [ $compute_vtlnmdl == "true" ]; then
|
||||
cat $dir/cur.warp | awk '{print $1, (0.85+0.01*$2);}' > $dir/cur.factor
|
||||
compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/cur.factor --config=conf/mfcc.conf scp:data_prep/train_wav.scp ark:$dir/tmp.ark 2>$dir/mfcc.log
|
||||
vtlnfeats="ark:add-deltas ark:$dir/tmp.ark ark:- |"
|
||||
|
||||
# Compute diagonal fMLLR transform to normalize VTLN feats.
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
|
||||
gmm-est-fmllr --fmllr-update-type=diag $spk2utt_opt $dir/$x.mdl "$vtlnfeats" ark,o:- ark:$dir/vtln.trans ) 2>$dir/vtln_fmllr.log || exit 1;
|
||||
|
||||
vtlnfeats="ark:add-deltas ark:$dir/tmp.ark ark:- | transform-feats $utt2spk_opt ark:$dir/vtln.trans ark:- ark:- |"
|
||||
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$vtlnfeats" ark:- $dir/$x.acc3 ) 2>$dir/acc_vtlnmdl.log || exit 1;
|
||||
# Update model.
|
||||
gmm-est $dir/$x.mdl $dir/$x.acc3 $dir/$x.vtlnmdl \
|
||||
2>$dir/est_vtlnmdl.log || exit 1;
|
||||
rm $dir/$x.acc3
|
||||
ln -s $x.vtlnmdl $dir/final.vtlnmdl
|
||||
rm $dir/tmp.ark
|
||||
fi
|
||||
|
||||
|
||||
( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
|
||||
ln -s $x.alimdl final.alimdl;
|
||||
ln -s 0.lvtln final.lvtln;
|
||||
ln -s cur.trans final.trans )
|
|
@ -0,0 +1,123 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# To be run from ..
|
||||
|
||||
# This (train_tri2h) is training with splice-9-frames+HLDA features.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/tri2h
|
||||
srcdir=exp/tri1
|
||||
srcmodel=$srcdir/final.mdl
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
|
||||
numiters=30 # Number of iterations of training
|
||||
maxiterinc=20 # Last iter to increase #Gauss on.
|
||||
numleaves=1500
|
||||
numgauss=$numleaves
|
||||
totgauss=7000 # Target #Gaussians
|
||||
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
|
||||
realign_iters="10 15 20 25";
|
||||
hlda_iters="2 4 6 12";
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
|
||||
mkdir -p $dir
|
||||
cp $srcdir/topo $dir
|
||||
|
||||
# feats corresponding to orignal model
|
||||
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
|
||||
rawfeats="ark:splice-feats scp:data/train.scp ark:- |"
|
||||
# The "speedup" parameter controls how much of the data to use
|
||||
# in the most intensive part of the HLDA transform computation.
|
||||
speedup=0.1
|
||||
|
||||
echo "aligning all training data"
|
||||
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
|
||||
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
|
||||
ali-to-post ark:$dir/0.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
|
||||
acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
|
||||
ark:- $dir/lda.acc 2>$dir/lda_acc.log
|
||||
est-lda --write-full-matrix=$dir/0.fullmat $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
|
||||
|
||||
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
|
||||
|
||||
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
|
||||
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
|
||||
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
|
||||
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
|
||||
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
|
||||
build-tree --verbose=1 --max-leaves=$numleaves \
|
||||
$dir/treeacc $dir/roots.txt \
|
||||
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
|
||||
|
||||
gmm-init-model --write-occs=$dir/1.occs \
|
||||
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
|
||||
|
||||
# Convert alignments generated from monophone model, to use as initial alignments.
|
||||
|
||||
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
|
||||
# Debug step only: convert back and check they're the same.
|
||||
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
|
||||
2>/dev/null | cmp - $dir/0.ali || exit 1;
|
||||
|
||||
# Make training graphs
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
|
||||
"ark:| gzip -c > $dir/graphs.fsts.gz" \
|
||||
2>$dir/compile_graphs.log || exit 1
|
||||
|
||||
cur_mat_iter=0
|
||||
x=1
|
||||
while [ $x -lt $numiters ]; do
|
||||
echo pass $x
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
|
||||
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
if echo $hlda_iters | grep -w $x >/dev/null; then # Do HLDA update.
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.01 $silphonelist $dir/$x.mdl ark:- ark:- | \
|
||||
gmm-acc-hlda --speedup=$speedup --binary=false $dir/$x.mdl $dir/$cur_mat_iter.mat "$rawfeats" ark:- $dir/$x.hacc ) 2> $dir/hacc.$x.log || exit 1;
|
||||
|
||||
gmm-est-hlda $dir/$x.mdl $dir/$cur_mat_iter.fullmat $dir/$[$x+1].mdl $dir/$x.fullmat $dir/$x.mat $dir/$x.hacc 2> $dir/hupdate.$x.log || exit 1;
|
||||
cur_mat_iter=$x
|
||||
|
||||
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/$cur_mat_iter.mat ark:- ark:-|"
|
||||
else # do GMM update.
|
||||
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
|
||||
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
|
||||
fi
|
||||
rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
|
||||
if [ $x -le $maxiterinc ]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
x=$[$x+1]
|
||||
done
|
||||
|
||||
( cd $dir;
|
||||
rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
|
||||
rm final.mat 2>/dev/null; ln -s $cur_mat_iter.mat final.mat )
|
||||
|
|
@ -0,0 +1,127 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# To be run from ..
|
||||
|
||||
# This (train_tri2i) is training with triple-deltas+HLDA features.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/tri2i
|
||||
srcdir=exp/tri1
|
||||
srcmodel=$srcdir/final.mdl
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
|
||||
numiters=30 # Number of iterations of training
|
||||
maxiterinc=20 # Last iter to increase #Gauss on.
|
||||
numleaves=1500
|
||||
numgauss=$numleaves
|
||||
totgauss=7000 # Target #Gaussians
|
||||
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
|
||||
realign_iters="10 15 20 25";
|
||||
hlda_iters="2 4 6 12";
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
|
||||
mkdir -p $dir
|
||||
cp $srcdir/topo $dir
|
||||
|
||||
# feats corresponding to orignal model
|
||||
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
feats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
|
||||
rawfeats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- |"
|
||||
# The "speedup" parameter controls how much of the data to use
|
||||
# in the most intensive part of the HLDA transform computation.
|
||||
speedup=0.1
|
||||
|
||||
echo "aligning all training data"
|
||||
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
|
||||
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
|
||||
( ali-to-post ark:$dir/0.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
|
||||
acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | \
|
||||
add-deltas --delta-order=3 scp:- ark:- |" \
|
||||
ark:- $dir/lda.acc ) 2>$dir/lda_acc.log
|
||||
est-lda --write-full-matrix=$dir/0.fullmat $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
|
||||
|
||||
|
||||
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
|
||||
|
||||
|
||||
|
||||
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
|
||||
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
|
||||
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
|
||||
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
|
||||
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
|
||||
build-tree --verbose=1 --max-leaves=$numleaves \
|
||||
$dir/treeacc $dir/roots.txt \
|
||||
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
|
||||
|
||||
gmm-init-model --write-occs=$dir/1.occs \
|
||||
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
|
||||
|
||||
# Convert alignments generated from monophone model, to use as initial alignments.
|
||||
|
||||
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
|
||||
# Debug step only: convert back and check they're the same.
|
||||
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
|
||||
2>/dev/null | cmp - $dir/0.ali || exit 1;
|
||||
|
||||
# Make training graphs
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
|
||||
"ark:| gzip -c > $dir/graphs.fsts.gz" \
|
||||
2>$dir/compile_graphs.log || exit 1
|
||||
|
||||
cur_mat_iter=0
|
||||
x=1
|
||||
while [ $x -lt $numiters ]; do
|
||||
echo pass $x
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
|
||||
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
if echo $hlda_iters | grep -w $x >/dev/null; then # Do HLDA update.
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.01 $silphonelist $dir/$x.mdl ark:- ark:- | \
|
||||
gmm-acc-hlda --speedup=$speedup --binary=false $dir/$x.mdl $dir/$cur_mat_iter.mat "$rawfeats" ark:- $dir/$x.hacc ) 2> $dir/hacc.$x.log || exit 1;
|
||||
|
||||
gmm-est-hlda $dir/$x.mdl $dir/$cur_mat_iter.fullmat $dir/$[$x+1].mdl $dir/$x.fullmat $dir/$x.mat $dir/$x.hacc 2> $dir/hupdate.$x.log || exit 1;
|
||||
cur_mat_iter=$x
|
||||
|
||||
feats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- | transform-feats $dir/$cur_mat_iter.mat ark:- ark:-|"
|
||||
else # do GMM update.
|
||||
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
|
||||
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
|
||||
fi
|
||||
rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
|
||||
if [ $x -le $maxiterinc ]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
x=$[$x+1]
|
||||
done
|
||||
|
||||
( cd $dir;
|
||||
rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
|
||||
rm final.mat 2>/dev/null; ln -s $cur_mat_iter.mat final.mat )
|
||||
|
|
@ -0,0 +1,231 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# To be run from ..
|
||||
|
||||
# This (train_tri2j) is training with triple-deltas+LDA+MLLT.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/tri2j
|
||||
srcdir=exp/tri1
|
||||
srcmodel=$srcdir/final.mdl
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
|
||||
numiters=30 # Number of iterations of training
|
||||
maxiterinc=20 # Last iter to increase #Gauss on.
|
||||
numleaves=1500
|
||||
numgauss=$numleaves
|
||||
totgauss=7000 # Target #Gaussians
|
||||
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
|
||||
realign_iters="10 15 20 25";
|
||||
mllt_iters="2 4 6 12";
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
|
||||
mkdir -p $dir
|
||||
cp $srcdir/topo $dir
|
||||
|
||||
# feats corresponding to orignal model
|
||||
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
feats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
|
||||
# Subset of features used to train LDA and MLLT transforms.
|
||||
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --delta-order=3 scp:- ark:- | transform-feats $dir/0.mat ark:- ark:-|"
|
||||
|
||||
echo "aligning all training data"
|
||||
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
|
||||
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
|
||||
( ali-to-post ark:$dir/0.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
|
||||
acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --delta-order=3 scp:- ark:- |" \
|
||||
ark:- $dir/lda.acc ) 2>$dir/lda_acc.log
|
||||
est-lda $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
|
||||
|
||||
|
||||
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
|
||||
|
||||
|
||||
|
||||
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
|
||||
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
|
||||
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
|
||||
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
|
||||
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
|
||||
build-tree --verbose=1 --max-leaves=$numleaves \
|
||||
$dir/treeacc $dir/roots.txt \
|
||||
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
|
||||
|
||||
gmm-init-model --write-occs=$dir/1.occs \
|
||||
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
|
||||
|
||||
# Convert alignments generated from monophone model, to use as initial alignments.
|
||||
|
||||
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
|
||||
# Debug step only: convert back and check they're the same.
|
||||
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
|
||||
2>/dev/null | cmp - $dir/0.ali || exit 1;
|
||||
|
||||
# Make training graphs
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
|
||||
"ark:| gzip -c > $dir/graphs.fsts.gz" \
|
||||
2>$dir/compile_graphs.log || exit 1
|
||||
|
||||
cur_lda=$dir/0.mat
|
||||
x=1
|
||||
while [ $x -lt $numiters ]; do
|
||||
echo pass $x
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
|
||||
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
if echo $mllt_iters | grep -w $x >/dev/null; then # Do MLLT update.
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
|
||||
gmm-acc-mllt --binary=false $dir/$x.mdl "$featsub" ark:- $dir/$x.macc ) 2> $dir/macc.$x.log || exit 1;
|
||||
|
||||
est-mllt $dir/$x.mllt.new $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
|
||||
gmm-transform-means --binary=false $dir/$x.mllt.new $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
|
||||
compose-transforms $dir/$x.mllt.new $cur_lda $dir/$x.mat || exit 1;
|
||||
cur_lda=$dir/$x.mat
|
||||
|
||||
|
||||
feats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
|
||||
# Subset of features used to train MLLT transforms.
|
||||
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --delta-order=3 scp:- ark:- | transform-feats $cur_lda ark:- ark:-|"
|
||||
else # do GMM update.
|
||||
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
|
||||
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
|
||||
fi
|
||||
rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
|
||||
if [[ $x -lt 21 ]]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
if [ $x -le $maxiterinc ]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
x=$[$x+1]
|
||||
done
|
||||
|
||||
|
||||
( cd $dir;
|
||||
rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
|
||||
rm final.mat 2>/dev/null; ln -s `basename $cur_lda` final.mat )
|
||||
=======
|
||||
#!/bin/bash
|
||||
|
||||
# To be run from ..
|
||||
|
||||
# This (train_tri2j) is training with splice-9-frames+HLDA features.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/tri2j
|
||||
srcdir=exp/tri
|
||||
srcmodel=$srcdir/30.mdl
|
||||
srcgraphs=ark:$srcdir/graphs.fsts
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
mkdir -p $dir
|
||||
cp $srcdir/topo $dir
|
||||
numgauss=1500
|
||||
incgauss=275 # Inc by 275 per iter for 20 iters; 1500 + 275*20 = 7000 which is
|
||||
# similar to the HTK baseline.
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
realign_iters="10 15 20 25";
|
||||
hlda_iters="2 4 6 12";
|
||||
|
||||
# feats corresponding to orignal model
|
||||
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
|
||||
rawfeats="ark:splice-feats scp:data/train.scp ark:- |"
|
||||
# The "speedup" parameter controls how much of the data to use
|
||||
# in the most intensive part of the HLDA transform computation.
|
||||
speedup=0.1
|
||||
|
||||
if false; then
|
||||
|
||||
echo "aligning all training data"
|
||||
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel $srcgraphs "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
|
||||
ali-to-post ark:$dir/0.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
|
||||
acc-lda $srcmodel "ark:head -800 data/train.scp | splice-feats scp:- ark:- |" \
|
||||
ark:- $dir/lda.acc 2>$dir/lda_acc.log
|
||||
est-lda --write-full-matrix=$dir/0.fullmat $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
|
||||
|
||||
|
||||
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
|
||||
|
||||
|
||||
|
||||
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
|
||||
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
|
||||
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
|
||||
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
|
||||
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
|
||||
build-tree --verbose=1 --max-leaves=$numleaves \
|
||||
$dir/treeacc $dir/roots.txt \
|
||||
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
|
||||
|
||||
gmm-init-model --write-occs=$dir/1.occs \
|
||||
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
|
||||
|
||||
# Convert alignments generated from monophone model, to use as initial alignments.
|
||||
|
||||
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
|
||||
# Debug step only: convert back and check they're the same.
|
||||
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
|
||||
2>/dev/null | cmp - $dir/0.ali || exit 1;
|
||||
|
||||
# Make training graphs
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra ark:$dir/graphs.fsts \
|
||||
2>$dir/compile_graphs.log || exit 1
|
||||
|
||||
cur_mat_iter=0
|
||||
for x in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29; do
|
||||
echo pass $x
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl ark:$dir/graphs.fsts "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
if echo $hlda_iters | grep -w $x >/dev/null; then # Do HLDA update.
|
||||
ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.01 $silphonelist $dir/$x.mdl ark:- ark:- | \
|
||||
gmm-acc-hlda --speedup=$speedup --binary=false $dir/$x.mdl $dir/$cur_mat_iter.mat "$rawfeats" ark:- $dir/$x.hacc 2> $dir/hacc.$x.log || exit 1;
|
||||
|
||||
gmm-est-hlda $dir/$x.mdl $dir/$cur_mat_iter.fullmat $dir/$[$x+1].mdl $dir/$x.fullmat $dir/$x.mat $dir/$x.hacc 2> $dir/hupdate.$x.log || exit 1;
|
||||
cur_mat_iter=$x
|
||||
|
||||
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/$cur_mat_iter.mat ark:- ark:-|"
|
||||
else # do GMM update.
|
||||
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
|
||||
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
|
||||
fi
|
||||
rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
|
||||
if [[ $x -lt 21 ]]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
done
|
|
@ -0,0 +1,209 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# To be run from ..
|
||||
|
||||
# This (train_tri2k.sh) is training the exponential transform
|
||||
# after LDA (so the same as LDA+MLLT+ET, since ET includes
|
||||
# MLLT).
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/tri2k
|
||||
srcdir=exp/tri1
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
srcmodel=$srcdir/final.mdl
|
||||
dim=40
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
# The spk2utt_opt uses a subset of utterances that we create; this is only
|
||||
# needed by programs that use the subset.
|
||||
spk2utt_opt=--spk2utt=ark:$dir/spk2utt
|
||||
# the utt2spk opt is used by programs that use all the data so give
|
||||
# it the original utt2spk file.
|
||||
utt2spk_opt=--utt2spk=ark:data/train.utt2spk
|
||||
normtype=mean # et option; could be mean, or none
|
||||
|
||||
numiters=30 # Number of iterations of training
|
||||
maxiterinc=20 # Last iter to increase #Gauss on.
|
||||
numiters_et=15 # Before this, update et.
|
||||
numleaves=1500
|
||||
numgauss=$numleaves
|
||||
totgauss=7000 # Target #Gaussians
|
||||
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
|
||||
realign_iters="10 15 20 25";
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
|
||||
nutt=15 # Use at most 15 utterances from each speaker for
|
||||
# estimating transforms, and A and B (will use all the data
|
||||
# for estimating the model though, so be careful: we're
|
||||
# not always using the lists in $dir).
|
||||
|
||||
mkdir -p $dir
|
||||
cp $srcdir/topo $dir
|
||||
|
||||
|
||||
awk '{ printf("%s ",$1); for(x=2; x<=NF&&x<='$nutt'+1;x++)
|
||||
{ printf("%s ", $x); } printf("\n"); }' <data/train.spk2utt >$dir/spk2utt
|
||||
scripts/spk2utt_to_utt2spk.pl < $dir/spk2utt > $dir/utt2spk
|
||||
cat $dir/utt2spk | awk '{print $1}' > $dir/uttlist
|
||||
scripts/filter_scp.pl $dir/uttlist <data/train.scp >$dir/train.scp
|
||||
|
||||
|
||||
srcfeats="ark,s,cs:add-deltas scp:data/train.scp ark:- |"
|
||||
|
||||
# For now, there is no subsetting.
|
||||
basefeats="ark,s,cs:splice-feats scp:data/train.scp ark:- | transform-feats $dir/lda.mat ark:- ark:- |"
|
||||
## The following two variables will get changed in the script.
|
||||
feats="$basefeats"
|
||||
|
||||
|
||||
|
||||
echo "aligning all training data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel "$srcgraphs" \
|
||||
"$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
|
||||
|
||||
echo "computing LDA transform"
|
||||
( ali-to-post ark:$dir/0.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
|
||||
acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
|
||||
ark:- $dir/lda.acc ) 2>$dir/lda_acc.log || exit 1
|
||||
|
||||
est-lda --dim=$dim $dir/lda.mat $dir/lda.acc 2>$dir/lda_est.log || exit 1
|
||||
|
||||
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali \
|
||||
$dir/treeacc 2> $dir/acc.tree.log || exit 1;
|
||||
|
||||
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
|
||||
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
|
||||
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
|
||||
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
|
||||
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
|
||||
|
||||
build-tree --verbose=1 --max-leaves=$numleaves \
|
||||
$dir/treeacc $dir/roots.txt \
|
||||
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
|
||||
|
||||
gmm-init-model --write-occs=$dir/1.occs \
|
||||
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
|
||||
|
||||
# Convert alignments generated from previous model, to use as initial alignments.
|
||||
|
||||
rm $dir/treeacc
|
||||
|
||||
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali
|
||||
2>$dir/convert.log || exit 1
|
||||
|
||||
rm $dir/0.ali
|
||||
|
||||
# Make training graphs
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
|
||||
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
|
||||
|
||||
gmm-init-et --normalize-type=$normtype --binary=false --dim=$dim $dir/1.et 2>$dir/init_et.log || exit 1
|
||||
|
||||
x=1
|
||||
while [ $x -lt $numiters ]; do
|
||||
x1=$[$x+1];
|
||||
echo pass $x
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
|
||||
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
|
||||
if [ $x -lt $numiters_et ]; then
|
||||
# Work out current transforms:
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
|
||||
gmm-post-to-gpost $dir/$x.mdl "$feats" ark:- ark:- | \
|
||||
gmm-est-et $spk2utt_opt --verbose=1 $dir/$x.mdl $dir/$x.et "$basefeats" \
|
||||
ark,s,cs:- ark:$dir/$x.trans ark,t:$dir/$x.warp ) 2> $dir/trans.$x.log || exit 1;
|
||||
|
||||
# Remove previous transforms, if present.
|
||||
if [ $x -gt 1 ]; then rm $dir/$[$x-1].trans; fi
|
||||
|
||||
# Now change $feats to correspond to the transformed features. We compose the
|
||||
# transforms themselves (it's more efficient than transforming the features
|
||||
# twice).
|
||||
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/lda.mat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/$x.trans ark:- ark:- |"
|
||||
fi
|
||||
|
||||
# Accumulate stats to update model:
|
||||
gmm-acc-stats-ali $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2>$dir/gmm_acc.$x.log || exit 1;
|
||||
# Update model.
|
||||
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$x1.mdl \
|
||||
2>$dir/gmm_est.$x.log || exit 1;
|
||||
|
||||
rm $dir/$x.acc $dir/$x.mdl
|
||||
|
||||
|
||||
if [ $x -lt $numiters_et ]; then
|
||||
# Alternately estimate either A or B.
|
||||
if [ $[$x%2] == 0 ]; then # Estimate A:
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $dir/$x1.mdl ark:- ark:- | \
|
||||
gmm-post-to-gpost $dir/$x1.mdl "$feats" ark:- ark:- | \
|
||||
gmm-et-acc-a $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$basefeats" ark,s,cs:- $dir/$x.et_acc_a ) 2> $dir/acc_a.$x.log || exit 1;
|
||||
gmm-et-est-a --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.et_acc_a 2> $dir/update_a.$x.log || exit 1;
|
||||
rm $dir/$x.et_acc_a
|
||||
else
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $dir/$x1.mdl ark:- ark:- | \
|
||||
gmm-post-to-gpost $dir/$x1.mdl "$feats" ark:- ark:- | \
|
||||
gmm-et-acc-b $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$basefeats" ark,s,cs:- ark:$dir/$x.trans ark:$dir/$x.warp $dir/$x.et_acc_b ) 2> $dir/acc_b.$x.log || exit 1;
|
||||
gmm-et-est-b --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.mat $dir/$x.et_acc_b 2> $dir/update_b.$x.log || exit 1;
|
||||
rm $dir/$x.et_acc_b
|
||||
# Careful!: gmm-transform-means here changes $x1.mdl in-place.
|
||||
gmm-transform-means $dir/$x.mat $dir/$x1.mdl $dir/$x1.mdl 2> $dir/transform_means.$x.log
|
||||
fi
|
||||
fi
|
||||
if [ $x -le $maxiterinc ]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
x=$[$x+1];
|
||||
done
|
||||
|
||||
|
||||
gmm-et-get-b $dir/$numiters_et.et $dir/B.mat 2>$dir/get_b.log || exit 1
|
||||
compose-transforms $dir/B.mat $dir/lda.mat $dir/default.mat 2>>$dir/get_b.log || exit 1
|
||||
defaultfeats="ark,s,cs:splice-feats scp:data/train.scp ark:- | transform-feats $dir/default.mat ark:- ark:- |"
|
||||
|
||||
# Accumulate stats for "alignment model" which is as the model but with
|
||||
# the default features (shares Gaussian-level alignments).
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$defaultfeats" ark:- $dir/$x.acc2 ) 2>$dir/acc_alimdl.log || exit 1;
|
||||
# Update model.
|
||||
gmm-est --write-occs=$dir/final.occs --remove-low-count-gaussians=false $dir/$x.mdl $dir/$x.acc2 $dir/$x.alimdl \
|
||||
2>$dir/est_alimdl.log || exit 1;
|
||||
rm $dir/$x.acc2
|
||||
|
||||
|
||||
# The following files may be useful for display purposes.
|
||||
for n in 1 2 3 4 5 6 7 8 9 10 11 12 13 14; do
|
||||
cat $dir/$n.warp | scripts/process_warps.pl data/spk2gender.map > $dir/warps.$n
|
||||
done
|
||||
|
||||
( cd $dir; rm final.mdl 2>/dev/null;
|
||||
ln -s $x.mdl final.mdl; ln -s $x.alimdl final.alimdl;
|
||||
ln -s $numiters_et.et final.et
|
||||
ln -s $[$numiters_et-1].trans final.trans )
|
||||
|
||||
|
|
@ -0,0 +1,156 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation Arnab Ghoshal
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# To be run from ..
|
||||
|
||||
# This (train_tri2l) is training with splice-9-frames+LDA features,
|
||||
# plus MLLT plus CMLLR/fMLLR (i.e. speaker adapted training).
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
dir=exp/tri2l
|
||||
srcdir=exp/tri1
|
||||
srcmodel=$srcdir/final.mdl
|
||||
srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
|
||||
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
|
||||
numiters=30 # Number of iterations of training
|
||||
maxiterinc=20 # Last iter to increase #Gauss on.
|
||||
numleaves=1500
|
||||
numgauss=$numleaves
|
||||
totgauss=7000 # Target #Gaussians
|
||||
incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
|
||||
silphonelist=`cat data/silphones.csl`
|
||||
realign_iters="10 15 20 25";
|
||||
mllt_iters="2 4 6 8";
|
||||
fmllr_iters="9 14 19"
|
||||
spk2utt_opt="--spk2utt=ark:data/train.spk2utt"
|
||||
utt2spk_opt="--utt2spk=ark:data/train.utt2spk"
|
||||
|
||||
# feats corresponding to original model
|
||||
srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
|
||||
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
|
||||
# Subset of features used to train LDA and MLLT transforms.
|
||||
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $dir/0.mat ark:- ark:-|"
|
||||
|
||||
mkdir -p $dir
|
||||
cp $srcdir/topo $dir
|
||||
|
||||
echo "aligning all training data"
|
||||
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $srcmodel \
|
||||
"$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
|
||||
|
||||
( ali-to-post ark:$dir/0.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
|
||||
acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
|
||||
ark:- $dir/lda.acc ) 2>$dir/lda_acc.log
|
||||
est-lda $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
|
||||
|
||||
|
||||
acc-tree-stats --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log || exit 1;
|
||||
|
||||
|
||||
cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
|
||||
cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
|
||||
scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
|
||||
compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
|
||||
|
||||
scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
|
||||
|
||||
build-tree --verbose=1 --max-leaves=$numleaves \
|
||||
$dir/treeacc $dir/roots.txt \
|
||||
$dir/questions.qst $dir/topo $dir/tree 2> $dir/train_tree.log || exit 1;
|
||||
|
||||
gmm-init-model --write-occs=$dir/1.occs \
|
||||
$dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
|
||||
|
||||
rm $dir/treeacc
|
||||
|
||||
# Convert alignments generated from monophone model, to use as initial alignments.
|
||||
|
||||
convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log
|
||||
# Debug step only: convert back and check they're the same.
|
||||
convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
|
||||
2>/dev/null | cmp - $dir/0.ali || exit 1;
|
||||
|
||||
rm $dir/0.ali
|
||||
|
||||
# Make training graphs
|
||||
echo "Compiling training graphs"
|
||||
compile-train-graphs $dir/tree $dir/1.mdl data/L.fst ark:data/train.tra \
|
||||
"ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1
|
||||
|
||||
cur_lda=$dir/0.mat
|
||||
x=1
|
||||
while [ $x -lt $numiters ]; do
|
||||
echo pass $x
|
||||
if echo $realign_iters | grep -w $x >/dev/null; then
|
||||
echo "Aligning data"
|
||||
gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
|
||||
"ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
|
||||
ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
|
||||
fi
|
||||
if echo $fmllr_iters | grep -w $x >/dev/null; then # Compute CMLLR transforms.
|
||||
sifeats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
|
||||
gmm-post-to-gpost $dir/$x.mdl "$feats" ark:- ark:- | \
|
||||
gmm-est-fmllr-gpost $spk2utt_opt $dir/$x.mdl "$sifeats" ark,s,cs:- ark:$dir/tmp.trans ) \
|
||||
2> $dir/trans.$x.log || exit 1;
|
||||
mv $dir/tmp.trans $dir/cur.trans
|
||||
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/cur.trans ark:- ark:- |"
|
||||
fi
|
||||
if echo $mllt_iters | grep -w $x >/dev/null; then # Do MLLT update.
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
|
||||
gmm-acc-mllt --binary=false $dir/$x.mdl "$featsub" ark:- $dir/$x.macc ) 2> $dir/macc.$x.log || exit 1;
|
||||
|
||||
est-mllt $dir/$x.mat.new $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
|
||||
gmm-transform-means --binary=false $dir/$x.mat.new $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
|
||||
compose-transforms --print-args=false $dir/$x.mat.new $cur_lda $dir/$x.mat || exit 1;
|
||||
cur_lda=$dir/$x.mat
|
||||
|
||||
|
||||
feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
|
||||
# Subset of features used to train MLLT transforms.
|
||||
featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $cur_lda ark:- ark:-|"
|
||||
else # do GMM update.
|
||||
gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log || exit 1;
|
||||
gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
|
||||
fi
|
||||
rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
|
||||
if [ $x -le $maxiterinc ]; then
|
||||
numgauss=$[$numgauss+$incgauss];
|
||||
fi
|
||||
x=$[$x+1]
|
||||
done
|
||||
|
||||
defaultfeats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
|
||||
|
||||
# Accumulate stats for "alignment model" which is as the model but with
|
||||
# the unadapted, default features (shares Gaussian-level alignments).
|
||||
( ali-to-post ark:$dir/cur.ali ark:- | \
|
||||
gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$defaultfeats" ark:- $dir/$x.acc2 ) 2>$dir/acc_alimdl.log || exit 1;
|
||||
# Update model.
|
||||
gmm-est --write-occs=$dir/final.occs --remove-low-count-gaussians=false \
|
||||
$dir/$x.mdl $dir/$x.acc2 $dir/$x.alimdl \
|
||||
2>$dir/est_alimdl.log || exit 1;
|
||||
rm $dir/$x.acc2
|
||||
|
||||
( cd $dir; rm final.mdl final.alimdl 2>/dev/null;
|
||||
ln -s $x.mdl final.mdl; ln -s $x.alimdl final.alimdl
|
||||
ln -s `basename $cur_lda` final.mat )
|
||||
|
|
@ -0,0 +1,48 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# Train UBM from a trained HMM/GMM system.
|
||||
|
||||
if [ -f path.sh ]; then . path.sh; fi
|
||||
|
||||
dir=exp/ubma
|
||||
mkdir -p $dir
|
||||
srcdir=exp/tri1
|
||||
|
||||
init-ubm --intermediate-numcomps=2000 --ubm-numcomps=400 --verbose=2 \
|
||||
--fullcov-ubm=true $srcdir/final.mdl $srcdir/final.occs \
|
||||
$dir/0.ubm 2> $dir/cluster.log
|
||||
|
||||
|
||||
|
||||
subset[0]=1000
|
||||
subset[1]=1500
|
||||
subset[2]=2000
|
||||
subset[3]=2500
|
||||
|
||||
for x in 0 1 2 3; do
|
||||
echo "Pass $x"
|
||||
feats="ark:scripts/subset_scp.pl ${subset[$x]} data/train.scp | add-deltas --print-args=false scp:- ark:- |"
|
||||
fgmm-acc-stats --diag-gmm-nbest=15 --binary=false --verbose=2 $dir/$x.ubm "$feats" $dir/$x.acc \
|
||||
2> $dir/acc.$x.log || exit 1;
|
||||
fgmm-est --verbose=2 $dir/$x.ubm $dir/$x.acc \
|
||||
$dir/$[$x+1].ubm 2> $dir/update.$x.log || exit 1;
|
||||
rm $dir/$x.acc $dir/$x.ubm
|
||||
done
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
|
||||
Each subdirectory of this directory contains the
|
||||
scripts for a sequence of experiments.
|
||||
|
||||
s1: This setup is experiments with GMM-based systems with various
|
||||
Maximum Likelihood
|
||||
techniques including global and speaker-specific transforms.
|
||||
See a parallel setup in ../rm/s1
|
|
@ -0,0 +1 @@
|
|||
--use-energy=false # only non-default option.
|
|
@ -0,0 +1,22 @@
|
|||
<Topology>
|
||||
<TopologyEntry>
|
||||
<ForPhones>
|
||||
NONSILENCEPHONES
|
||||
</ForPhones>
|
||||
<State> 0 <PdfClass> 0 <Transition> 0 0.75 <Transition> 1 0.25 </State>
|
||||
<State> 1 <PdfClass> 1 <Transition> 1 0.75 <Transition> 2 0.25 </State>
|
||||
<State> 2 <PdfClass> 2 <Transition> 2 0.75 <Transition> 3 0.25 </State>
|
||||
<State> 3 </State>
|
||||
</TopologyEntry>
|
||||
<TopologyEntry>
|
||||
<ForPhones>
|
||||
SILENCEPHONES
|
||||
</ForPhones>
|
||||
<State> 0 <PdfClass> 0 <Transition> 0 0.25 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 </State>
|
||||
<State> 1 <PdfClass> 1 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
|
||||
<State> 2 <PdfClass> 2 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
|
||||
<State> 3 <PdfClass> 3 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
|
||||
<State> 4 <PdfClass> 4 <Transition> 4 0.25 <Transition> 5 0.75 </State>
|
||||
<State> 5 </State>
|
||||
</TopologyEntry>
|
||||
</Topology>
|
|
@ -0,0 +1,64 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
|
||||
# This program takes on its standard input a list of utterance
|
||||
# id's, one for each line. (e.g. 4k0c030a is a an utterance id).
|
||||
# It takes as
|
||||
# Extracts from the dot files the transcripts for a given
|
||||
# dataset (represented by a file list).
|
||||
#
|
||||
|
||||
@ARGV == 1 || die "find_transcripts.pl dot_files_flist < utterance_ids > transcripts";
|
||||
$dot_flist = shift @ARGV;
|
||||
|
||||
open(L, "<$dot_flist") || die "Opening file list of dot files: $dot_flist\n";
|
||||
while(<L>){
|
||||
chop;
|
||||
m:\S+/(\w{6})00.dot: || die "Bad line in dot file list: $_";
|
||||
$spk = $1;
|
||||
$spk2dot{$spk} = $_;
|
||||
}
|
||||
|
||||
|
||||
|
||||
while(<STDIN>){
|
||||
chop;
|
||||
$uttid = $_;
|
||||
$uttid =~ m:(\w{6})\w\w: || die "Bad utterance id $_";
|
||||
$spk = $1;
|
||||
if($spk ne $curspk) {
|
||||
%utt2trans = { }; # Don't keep all the transcripts in memory...
|
||||
$curspk = $spk;
|
||||
$dotfile = $spk2dot{$spk};
|
||||
defined $dotfile || die "No dot file for speaker $spk\n";
|
||||
open(F, "<$dotfile") || die "Error opening dot file $dotfile\n";
|
||||
while(<F>) {
|
||||
$_ =~ m:(.+)\((\w{8})\)\s*$: || die "Bad line $_ in dot file $dotfile (line $.)\n";
|
||||
$trans = $1;
|
||||
$utt = $2;
|
||||
$utt2trans{$utt} = $trans;
|
||||
}
|
||||
}
|
||||
if(!defined $utt2trans{$uttid}) {
|
||||
print STDERR "No transcript for utterance $uttid (current dot file is $dotfile)\n";
|
||||
} else {
|
||||
print "$uttid $utt2trans{$uttid}\n";
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,31 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# takes in a file list with lines like
|
||||
# /mnt/matylda2/data/WSJ1/13-16.1/wsj1/si_dt_20/4k0/4k0c030a.wv1
|
||||
# and outputs an scp in kaldi format with lines like
|
||||
# 4k0c030a /mnt/matylda2/data/WSJ1/13-16.1/wsj1/si_dt_20/4k0/4k0c030a.wv1
|
||||
# (the first thing is the utterance-id, which is the same as the basename of the file.
|
||||
|
||||
|
||||
while(<>){
|
||||
m:^\S+/(\w+)\.[wW][vV]1$: || die "Bad line $_";
|
||||
$id = $1;
|
||||
$id =~ tr/A-Z/a-z/; # Necessary because of weirdness on disk 13-16.1 (uppercase filenames)
|
||||
print "$id $_";
|
||||
}
|
||||
|
|
@ -0,0 +1,62 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# This program takes as its standard input an .ndx file from the WSJ corpus that looks
|
||||
# like this:
|
||||
#;; File: tr_s_wv1.ndx, updated 04/26/94
|
||||
#;;
|
||||
#;; Index for WSJ0 SI-short Sennheiser training data
|
||||
#;; Data is read WSJ sentences, Sennheiser mic.
|
||||
#;; Contains 84 speakers X (~100 utts per speaker MIT/SRI and ~50 utts
|
||||
#;; per speaker TI) = 7236 utts
|
||||
#;;
|
||||
#11_1_1:wsj0/si_tr_s/01i/01ic0201.wv1
|
||||
#11_1_1:wsj0/si_tr_s/01i/01ic0202.wv1
|
||||
#11_1_1:wsj0/si_tr_s/01i/01ic0203.wv1
|
||||
|
||||
#and as command-line arguments it takes the names of the WSJ disk locations, e.g.:
|
||||
#/mnt/matylda2/data/WSJ0/11-1.1 /mnt/matylda2/data/WSJ0/11-10.1 ... etc.
|
||||
# It outputs a list of absolute pathnames (it does this by replacing e.g. 11_1_1 with
|
||||
# /mnt/matylda2/data/WSJ0/11-1.1.
|
||||
# It also does a slight fix because one of the WSJ disks (WSJ1/13-16.1) was distributed with
|
||||
# uppercase rather than lower case filenames.
|
||||
|
||||
foreach $fn (@ARGV) {
|
||||
$fn =~ m:.+/([0-9\.\-]+)/?$: || die "Bad command-line argument $fn\n";
|
||||
$disk_id=$1;
|
||||
$disk_id =~ tr/-\./__/; # replace - and . with - so 11-10.1 becomes 11_10_1
|
||||
$fn =~ s:/$::; # Remove final slash, just in case it is present.
|
||||
$disk2fn{$disk_id} = $fn;
|
||||
}
|
||||
|
||||
while(<STDIN>){
|
||||
if(m/^;/){ next; } # Comment. Ignore it.
|
||||
else {
|
||||
m/^([0-9_]+):\s*(\S+)$/ || die "Could not parse line $_";
|
||||
$disk=$1;
|
||||
if(!defined $disk2fn{$disk}) {
|
||||
die "Disk id $disk not found";
|
||||
}
|
||||
$filename = $2; # as a subdirectory of the distributed disk.
|
||||
if($disk eq "13_16_1" && `hostname` =~ m/fit.vutbr.cz/) {
|
||||
# The disk 13-16.1 has been uppercased for some reason, on the
|
||||
# BUT system. This is a fix specifically for that case.
|
||||
$filename =~ tr/a-z/A-Z/; # This disk contains all uppercase filenames. Why?
|
||||
}
|
||||
print "$disk2fn{$disk}/$filename\n";
|
||||
}
|
||||
}
|
|
@ -0,0 +1,57 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# This takes data from the standard input that's unnormalized transcripts in the format
|
||||
# 4k2c0308 Of course there isn\'t any guarantee the company will keep its hot hand [misc_noise]
|
||||
# 4k2c030a [loud_breath] And new hardware such as the set of personal computers I\. B\. M\. introduced last week can lead to unexpected changes in the software business [door_slam]
|
||||
# and outputs normalized transcripts.
|
||||
# c.f. /mnt/matylda2/data/WSJ0/11-10.1/wsj0/transcrp/doc/dot_spec.doc
|
||||
|
||||
@ARGV == 1 || die "usage: normalize_transcript.pl noise_word < transcript > transcript2";
|
||||
$noise_word = shift @ARGV;
|
||||
|
||||
while(<STDIN>) {
|
||||
$_ =~ m:^(\S+) (.+): || die "bad line $_";
|
||||
$utt = $1;
|
||||
$trans = $2;
|
||||
print "$utt";
|
||||
foreach $w (split (" ",$trans)) {
|
||||
$w =~ tr:a-z:A-Z:; # Upcase everything to match the CMU dictionary. .
|
||||
$w =~ s:\\::g; # Remove backslashes. We don't need the quoting.
|
||||
if($w =~ m:^\[\<\w+\]$: || # E.g. [<door_slam], this means a door slammed in the preceding word. Delete.
|
||||
$w =~ m:^\[\w+\>\]$: || # E.g. [door_slam>], this means a door slammed in the next word. Delete.
|
||||
$w =~ m:\[\w+/\]$: || # E.g. [phone_ring/], which indicates the start of this phenomenon.
|
||||
$w =~ m:\[\/\w+]$: || # E.g. [/phone_ring], which indicates the end of this phenomenon.
|
||||
$w eq "~" || # This is used to indicate truncation of an utterance. Not a word.
|
||||
$w eq ".") { # "." is used to indicate a pause. Silence is optional anyway so not much
|
||||
# point including this in the transcript.
|
||||
next; # we won't print this word.
|
||||
} elsif($w =~ m:\[\w+\]:) { # Other noises, e.g. [loud_breath].
|
||||
print " $noise_word";
|
||||
} elsif($w =~ m:^\<([\w\']+)\>$:) {
|
||||
# e.g. replace <and> with and. (the <> means verbal deletion of a word).. but it's pronounced.
|
||||
print " $1";
|
||||
} elsif($w eq "--DASH") {
|
||||
print " -DASH"; # This is a common issue; the CMU dictionary has it as -DASH.
|
||||
# } elsif($w =~ m:(.+)\-DASH$:) { # E.g. INCORPORATED-DASH... seems the DASH gets combined with previous word
|
||||
# print " $1 -DASH";
|
||||
} else {
|
||||
print " $w";
|
||||
}
|
||||
}
|
||||
print "\n";
|
||||
}
|
|
@ -0,0 +1,54 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# takes a transcript file with lines like
|
||||
# 40po031e THE RATE FELL TO SIX %PERCENT IN NOVEMBER NINETEEN EIGHTY SIX .PERIOD
|
||||
# on the standard input.
|
||||
# The first (and only) command-line argument is the filename of a dictionary file with lines like
|
||||
# ZYUGANOV Z Y UW1 G AA0 N AA0 V
|
||||
# This file replaces all OOVs with the spoken-noise word and prints counts for each OOV on the standard error.
|
||||
|
||||
@ARGV == 2 || die "Usage: oov2unk.pl dict spoken-noise-word < transcript > transcript2";
|
||||
|
||||
$dict = shift @ARGV;
|
||||
open(F, "<$dict") || die "Died opening dictionary file $dict\n";
|
||||
while(<F>){
|
||||
@A = split(" ", $_);
|
||||
$word = shift @A;
|
||||
$seen{$word} = 1;
|
||||
}
|
||||
$spoken_noise_word = shift @ARGV;
|
||||
|
||||
while(<STDIN>) {
|
||||
@A = split(" ", $_);
|
||||
$utt = shift @A;
|
||||
print $utt;
|
||||
foreach $a (@A) {
|
||||
if(defined $seen{$a}) {
|
||||
print " $a";
|
||||
} else {
|
||||
$oov{$a}++;
|
||||
print " $spoken_noise_word";
|
||||
}
|
||||
}
|
||||
print "\n";
|
||||
}
|
||||
|
||||
|
||||
foreach $w (sort { $oov{$a} <=> $oov{$b} } keys %oov) {
|
||||
print STDERR "$w $oov{$w}\n";
|
||||
}
|
|
@ -0,0 +1,157 @@
|
|||
# This script should be run from its own directory (.)
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# It takes as arguments a list of directories that should end
|
||||
# with numbers like 13-4.1. These are the subdirectories in the WSJ disks.
|
||||
# on the BUT system we can get these by doing:
|
||||
# ./run.sh /mnt/matylda2/data/WSJ?/??-{?,??}.?
|
||||
|
||||
# Another example is:
|
||||
# ./run.sh /ais/gobi2/speech/WSJ/*/??-{?,??}.?
|
||||
|
||||
|
||||
if [ $# -lt 4 ]; then
|
||||
echo "Too few arguments to run.sh: need a list of WSJ directories ending e.g. 11-13.1"
|
||||
exit 1;
|
||||
fi
|
||||
|
||||
rm -r links/ 2>/dev/null
|
||||
mkdir links/
|
||||
ln -s $* links
|
||||
|
||||
# This version for SI-84
|
||||
|
||||
cat links/11-13.1/wsj0/doc/indices/train/tr_s_wv1.ndx | \
|
||||
./ndx2flist.pl $* | sort | \
|
||||
grep -v 11-2.1/wsj0/si_tr_s/401 > train_si84.flist
|
||||
|
||||
# This version for SI-284
|
||||
cat links/13-34.1/wsj1/doc/indices/si_tr_s.ndx \
|
||||
links/11-13.1/wsj0/doc/indices/train/tr_s_wv1.ndx | \
|
||||
./ndx2flist.pl $* | sort | \
|
||||
grep -v 11-2.1/wsj0/si_tr_s/401 > train_si284.flist
|
||||
|
||||
|
||||
# Now for the test sets.
|
||||
# links/13-34.1/wsj1/doc/indices/readme.doc
|
||||
# describes all the different test sets.
|
||||
# Note: each test-set seems to come in multiple versions depending
|
||||
# on different vocabulary sizes, verbalized vs. non-verbalized
|
||||
# pronunciations, etc. We use the largest vocab and non-verbalized
|
||||
# pronunciations.
|
||||
# The most normal one seems to be the "baseline 60k test set", which
|
||||
# is h1_p0.
|
||||
|
||||
# Nov'92 (333 utts)
|
||||
# These index files have a slightly different format;
|
||||
# have to add .wv1
|
||||
cat links/11-13.1/wsj0/doc/indices/test/nvp/si_et_20.ndx | \
|
||||
./ndx2flist.pl $* | awk '{printf("%s.wv1\n", $1)}' | \
|
||||
sort > eval_nov92.flist
|
||||
|
||||
# Nov'93: (213 utts)
|
||||
# Have to replace a wrong disk-id.
|
||||
cat links/13-32.1/wsj1/doc/indices/wsj1/eval/h1_p0.ndx | \
|
||||
sed s/13_32_1/13_33_1/ | \
|
||||
./ndx2flist.pl $* | sort > eval_nov93.flist
|
||||
|
||||
# Dev-set for Nov'93 (503 utts)
|
||||
cat links/13-34.1/wsj1/doc/indices/h1_p0.ndx | \
|
||||
./ndx2flist.pl $* | sort > dev_nov93.flist
|
||||
|
||||
# Dev-set for Nov'93 (503 utts)
|
||||
# links/13-34.1/wsj1/doc/indices/h1_p0.ndx
|
||||
|
||||
# Finding the transcript files:
|
||||
for x in $*; do find $x -iname '*.dot'; done > dot_files.flist
|
||||
|
||||
# Convert the transcripts into our format (no normalization yet)
|
||||
for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
|
||||
./flist2scp.pl $x.flist | sort > ${x}_sph.scp
|
||||
cat ${x}_sph.scp | awk '{print $1}' | ./find_transcripts.pl dot_files.flist > $x.trans1
|
||||
done
|
||||
|
||||
# Do some initial normalization steps.
|
||||
noiseword="<NOISE>";
|
||||
for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
|
||||
cat $x.trans1 | ./normalize_transcript.pl $noiseword > $x.trans2 || exit 1
|
||||
done
|
||||
|
||||
if [ ! -f ../data/lexicon.txt ]; then
|
||||
echo "You need to get ../data/lexicon.txt first (see ../run.sh)"
|
||||
exit 1
|
||||
fi
|
||||
# Convert OOVs to <SPOKEN_NOISE>
|
||||
spoken_noise_word="<SPOKEN_NOISE>";
|
||||
for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
|
||||
cat $x.trans2 | ./oov2unk.pl ../data/lexicon.txt $spoken_noise_word | sort > $x.txt || exit 1 # the .txt is the final transcript.
|
||||
done
|
||||
|
||||
# Create scp's with wav's. (the wv1 in the distribution is not really wav, it is sph.)
|
||||
sph2pipe=`cd ../../../..; echo $PWD/tools/sph2pipe_v2.5/sph2pipe`
|
||||
if [ ! -f $sph2pipe ]; then
|
||||
echo "Could not find the sph2pipe program at $sph2pipe";
|
||||
exit 1;
|
||||
fi
|
||||
for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
|
||||
awk '{printf("%s '$sph2pipe' -f wav %s |\n", $1, $2);}' < ${x}_sph.scp > ${x}_wav.scp
|
||||
done
|
||||
|
||||
|
||||
# The 20K vocab, open-vocabulary language model (i.e. the one with UNK), without
|
||||
# verbalized pronunciations. This is the most common test setup, I understand.
|
||||
|
||||
cp links/13-32.1/wsj1/doc/lng_modl/base_lm/bcb20onp.z lm_bg.arpa.gz
|
||||
chmod u+w lm_bg.arpa.gz
|
||||
# trigram would be:
|
||||
|
||||
cat links/13-32.1/wsj1/doc/lng_modl/base_lm/tcb20onp.z | \
|
||||
perl -e 'while(<>){ if(m/^\\data\\/){ print; last; } } while(<>){ print; }' | \
|
||||
gzip -c -f > lm_tg.arpa.gz
|
||||
|
||||
export PATH=$PATH:../../../../tools/irstlm/bin
|
||||
prune-lm --threshold=1e-7 lm_tg.arpa.gz lm_tg_pruned.arpa
|
||||
gzip -f lm_tg_pruned.arpa
|
||||
|
||||
# Make the utt2spk and spk2utt files.
|
||||
for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
|
||||
cat ${x}_sph.scp | awk '{print $1}' | perl -ane 'chop; m:^...:; print "$_ $&\n";' > $x.utt2spk
|
||||
cat $x.utt2spk | ../scripts/utt2spk_to_spk2utt.pl > $x.spk2utt
|
||||
done
|
||||
|
||||
|
||||
if [ ! -f wsj0-train-spkrinfo.txt ]; then
|
||||
wget http://www.ldc.upenn.edu/Catalog/docs/LDC93S6A/wsj0-train-spkrinfo.txt
|
||||
fi
|
||||
|
||||
if [ ! -f wsj0-train-spkrinfo.txt ]; then
|
||||
echo "Could not get the spkrinfo.txt file from LDC website (moved)?"
|
||||
echo "This is possibly omitted from the training disks; couldn't find it."
|
||||
echo "Everything else may have worked; we just may be missing gender info"
|
||||
echo "which is only needed for VTLN-related diagnostics anyway."
|
||||
exit 1
|
||||
fi
|
||||
# Note: wsj0-train-spkrinfo.txt doesn't seem to be on the disks but the
|
||||
# LDC put it on the web. Perhaps it was accidentally omitted from the
|
||||
# disks. I put it in the repository.
|
||||
|
||||
cat links/11-13.1/wsj0/doc/spkrinfo.txt \
|
||||
links/13-34.1/wsj1/doc/train/spkrinfo.txt \
|
||||
./wsj0-train-spkrinfo.txt | \
|
||||
perl -ane 'tr/A-Z/a-z/;print;' | grep -v ';' | \
|
||||
awk '{print $1, $2}' > spk2gender.map
|
||||
|
|
@ -0,0 +1,2 @@
|
|||
export PATH=$PATH:../../../src/bin:../../../tools/openfst/bin:../../../src/fstbin/:../../../src/gmmbin/:../../../src/featbin/:../../../src/lm/
|
||||
export LC_ALL=C
|
|
@ -0,0 +1,304 @@
|
|||
#!/bin/bash
|
||||
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
exit 1;
|
||||
# This is a shell script, but it's recommended that you run the commands one by
|
||||
# one by copying and pasting into the shell.
|
||||
# Caution: some of the graph creation steps use quite a bit of memory, so you
|
||||
# might want to run this script on a machine that has plenty of memory.
|
||||
|
||||
# (1) To get the CMU dictionary, do:
|
||||
svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/
|
||||
# got this at revision 10742 in my current test. can add -r 10742 for strict
|
||||
# compatibility.
|
||||
|
||||
#(2) Dictionary preparation:
|
||||
|
||||
mkdir -p data
|
||||
|
||||
# Make phones symbol-table (adding in silence and verbal and non-verbal noises at this point).
|
||||
# We are adding suffixes _B, _E, _S for beginning, ending, and singleton phones.
|
||||
|
||||
cat cmudict/cmudict.0.7a.symbols | perl -ane 's:\r::; print;' | \
|
||||
awk 'BEGIN{print "<eps> 0"; print "SIL 1"; print "SPN 2"; print "NSN 3"; N=4; }
|
||||
{printf("%s %d\n", $1, N++); }
|
||||
{printf("%s_B %d\n", $1, N++); }
|
||||
{printf("%s_E %d\n", $1, N++); }
|
||||
{printf("%s_S %d\n", $1, N++); } ' >data/phones.txt
|
||||
|
||||
|
||||
# First make a version of the lexicon without the silences etc, but with the position-markers.
|
||||
# Remove the comments from the cmu lexicon and remove the (1), (2) from words with multiple
|
||||
# pronunciations.
|
||||
|
||||
grep -v ';;;' cmudict/cmudict.0.7a | perl -ane 'if(!m:^;;;:){ s:(\S+)\(\d+\) :$1 :; print; }' \
|
||||
| perl -ane '@A=split(" ",$_); $w = shift @A; @A>0||die;
|
||||
if(@A==1) { print "$w $A[0]_S\n"; } else { print "$w $A[0]_B ";
|
||||
for($n=1;$n<@A-1;$n++) { print "$A[$n] "; } print "$A[$n]_E\n"; } ' \
|
||||
> data/lexicon_nosil.txt
|
||||
|
||||
# Add to cmudict the silences, noises etc.
|
||||
|
||||
(echo '!SIL SIL'; echo '<s> '; echo '</s> '; echo '<SPOKEN_NOISE> SPN'; echo '<UNK> SPN'; echo '<NOISE> NSN'; ) | \
|
||||
cat - data/lexicon_nosil.txt > data/lexicon.txt
|
||||
|
||||
|
||||
silphones="SIL SPN NSN";
|
||||
# Generate colon-separated lists of silence and non-silence phones.
|
||||
scripts/silphones.pl data/phones.txt "$silphones" data/silphones.csl data/nonsilphones.csl
|
||||
|
||||
# This adds disambig symbols to the lexicon and produces data/lexicon_disambig.txt
|
||||
|
||||
ndisambig=`scripts/add_lex_disambig.pl data/lexicon.txt data/lexicon_disambig.txt`
|
||||
echo $ndisambig > data/lex_ndisambig
|
||||
# Next, create a phones.txt file that includes the disambig symbols.
|
||||
# the --include-zero includes the #0 symbol we pass through from the grammar.
|
||||
scripts/add_disambig.pl --include-zero data/phones.txt $ndisambig > data/phones_disambig.txt
|
||||
|
||||
# Make the words symbol-table; add the disambiguation symbol #0 (we use this in place of epsilon
|
||||
# in the grammar FST).
|
||||
cat data/lexicon.txt | awk '{print $1}' | sort | uniq | \
|
||||
awk 'BEGIN{print "<eps> 0";} {printf("%s %d\n", $1, NR);} END{printf("#0 %d\n", NR+1);} ' \
|
||||
> data/words.txt
|
||||
|
||||
|
||||
#(3)
|
||||
# data preparation (this step requires the WSJ disks, from LDC).
|
||||
# It takes as arguments a list of the directories ending in
|
||||
# e.g. 11-13.1 (we don't assume a single root dir because
|
||||
# there are different ways of unpacking them).
|
||||
|
||||
cd data_prep
|
||||
|
||||
#TODO: remove following system-specific comments.
|
||||
#On BUT system, do:
|
||||
./run.sh /mnt/matylda2/data/WSJ?/??-{?,??}.?
|
||||
|
||||
# On Geoff Hinton's system we can do:
|
||||
# ./run.sh /ais/gobi2/speech/WSJ/*/??-{?,??}.?
|
||||
|
||||
|
||||
cd ..
|
||||
|
||||
|
||||
|
||||
# Here is where we select what data to train on.
|
||||
# use all the si284 data.
|
||||
cp data_prep/train_si284_wav.scp data/train_wav.scp
|
||||
cp data_prep/train_si284.txt data/train.txt
|
||||
cp data_prep/train_si284.spk2utt data/train.spk2utt
|
||||
cp data_prep/train_si284.utt2spk data/train.utt2spk
|
||||
cp data_prep/spk2gender.map data/
|
||||
|
||||
for x in eval_nov92 dev_nov93 eval_nov93; do
|
||||
cp data_prep/$x.spk2utt data/$x.spk2utt
|
||||
cp data_prep/$x.utt2spk data/$x.utt2spk
|
||||
cp data_prep/$x.txt data/$x.txt
|
||||
done
|
||||
|
||||
for x in train eval_nov92 dev_nov93 eval_nov93; do
|
||||
cat data/$x.txt | scripts/sym2int.pl --ignore-first-field data/words.txt > data/$x.tra
|
||||
done
|
||||
|
||||
|
||||
# Get the right paths on our system by sourcing the following shell file
|
||||
# (edit it if it's not right for your setup).
|
||||
. path.sh
|
||||
|
||||
# Create the basic L.fst without disambiguation symbols, for use
|
||||
# in training.
|
||||
scripts/make_lexicon_fst.pl data/lexicon.txt 0.5 SIL | \
|
||||
fstcompile --isymbols=data/phones.txt --osymbols=data/words.txt \
|
||||
--keep_isymbols=false --keep_osymbols=false | \
|
||||
fstarcsort --sort_type=olabel > data/L.fst
|
||||
|
||||
# Create the lexicon FST with disambiguation symbols. There is an extra
|
||||
# step where we create a loop "pass through" the disambiguation symbols
|
||||
# from G.fst.
|
||||
|
||||
phone_disambig_symbol=`grep \#0 data/phones_disambig.txt | awk '{print $2}'`
|
||||
word_disambig_symbol=`grep \#0 data/words.txt | awk '{print $2}'`
|
||||
|
||||
scripts/make_lexicon_fst.pl data/lexicon_disambig.txt 0.5 SIL | \
|
||||
fstcompile --isymbols=data/phones_disambig.txt --osymbols=data/words.txt \
|
||||
--keep_isymbols=false --keep_osymbols=false | \
|
||||
fstaddselfloops "echo $phone_disambig_symbol |" "echo $word_disambig_symbol |" | \
|
||||
fstarcsort --sort_type=olabel > data/L_disambig.fst
|
||||
|
||||
|
||||
# Making the grammar FSTs
|
||||
# This step is quite specific to this WSJ setup.
|
||||
# see data_prep/run.sh for more about where these LMs came from.
|
||||
|
||||
steps/make_lm_fsts.sh
|
||||
|
||||
## Sanity check; just making sure the next command does not crash.
|
||||
fstdeterminizestar data/G_bg.fst >/dev/null
|
||||
|
||||
## Sanity check; just making sure the next command does not crash.
|
||||
fsttablecompose data/L_disambig.fst data/G_bg.fst | fstdeterminizestar >/dev/null
|
||||
|
||||
|
||||
# At this point, make sure that "./exp/" is somewhere you can write
|
||||
# a reasonably large amount of data (i.e. on a fast and large
|
||||
# disk somewhere). It can be a soft link if necessary.
|
||||
|
||||
|
||||
# (4) feature generation
|
||||
|
||||
|
||||
# Make the training features.
|
||||
# note that this runs 3-4 times faster if you compile with DEBUGLEVEL=0
|
||||
# (this turns on optimization).
|
||||
|
||||
# Set "dir" to someplace you can write to.
|
||||
dir=/mnt/matylda6/jhu09/qpovey/kaldi_wsj2_mfcc_e
|
||||
steps/make_mfcc_train.sh $dir
|
||||
steps/make_mfcc_test.sh $dir
|
||||
|
||||
|
||||
# (5) running the training and testing steps..
|
||||
|
||||
steps/train_mono.sh || exit 1;
|
||||
|
||||
(scripts/mkgraph.sh --mono data/G_tg_pruned.fst exp/mono/tree exp/mono/final.mdl exp/graph_mono_tg_pruned || exit 1;
|
||||
scripts/decode.sh exp/decode_mono_tgpr_eval92 exp/graph_mono_tg_pruned/HCLG.fst steps/decode_mono.sh data/eval_nov92.scp ) &
|
||||
|
||||
steps/train_tri1.sh || exit 1;
|
||||
|
||||
# add --no-queue --num-jobs 4 after "scripts/decode.sh" below, if you don't have
|
||||
# qsub on your system. The number of jobs to use depends on how many CPUs and
|
||||
# how much memory you have, on the local machine. If you do have qsub on your
|
||||
# system, you will probably have to edit steps/decode.sh anyway to change the
|
||||
# queue options... or if you have a different queueing system, you'd have to
|
||||
# modify the script to use that.
|
||||
|
||||
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri1/tree exp/tri1/final.mdl exp/graph_tri1_tg_pruned || exit 1;
|
||||
scripts/decode.sh exp/decode_tri1_tgpr_eval92 exp/graph_tri1_tg_pruned/HCLG.fst steps/decode_tri1.sh data/eval_nov92.scp ) &
|
||||
|
||||
steps/train_tri2a.sh || exit 1;
|
||||
|
||||
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2a/tree exp/tri2a/final.mdl exp/graph_tri2a_tg_pruned || exit 1;
|
||||
scripts/decode.sh exp/decode_tri2a_tgpr_eval92 exp/graph_tri2a_tg_pruned/HCLG.fst steps/decode_tri2a.sh data/eval_nov92.scp
|
||||
scripts/decode.sh exp/decode_tri2a_tgpr_eval93 exp/graph_tri2a_tg_pruned/HCLG.fst steps/decode_tri2a.sh data/eval_nov93.scp
|
||||
|
||||
scripts/decode.sh exp/decode_tri2a_tgpr_fmllr_utt_eval92 exp/graph_tri2a_tg_pruned/HCLG.fst steps/decode_tri2a_fmllr.sh data/eval_nov92.scp
|
||||
scripts/decode.sh --per-spk exp/decode_tri2a_tgpr_fmllr_eval92 exp/graph_tri2a_tg_pruned/HCLG.fst steps/decode_tri2a_fmllr.sh data/eval_nov92.scp
|
||||
|
||||
|
||||
) &
|
||||
|
||||
|
||||
steps/train_tri3a.sh || exit 1;
|
||||
|
||||
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri3a/tree exp/tri3a/final.mdl exp/graph_tri3a_tg_pruned || exit 1;
|
||||
scripts/decode.sh exp/decode_tri3a_tgpr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a.sh data/eval_nov92.scp
|
||||
# per-speaker fMLLR
|
||||
scripts/decode.sh --per-spk exp/decode_tri3a_tgpr_fmllr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a_fmllr.sh data/eval_nov92.scp
|
||||
# per-utterance fMLLR
|
||||
scripts/decode.sh exp/decode_tri3a_tgpr_uttfmllr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a_fmllr.sh data/eval_nov92.scp
|
||||
# per-speaker diagonal fMLLR
|
||||
scripts/decode.sh --per-spk exp/decode_tri3a_tgpr_dfmllr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a_diag_fmllr.sh data/eval_nov92.scp
|
||||
# per-utterance diagonal fMLLR
|
||||
scripts/decode.sh exp/decode_tri3a_tgpr_uttdfmllr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a_diag_fmllr.sh data/eval_nov92.scp
|
||||
)&
|
||||
|
||||
# will delete:
|
||||
## scripts/decode_queue_fmllr.sh exp/graph_tri3a_tg_pruned exp/tri3a/final.mdl exp/decode_tri3a_tg_pruned_fmllr &
|
||||
|
||||
#### Now alternative experiments... ###
|
||||
|
||||
# ET
|
||||
steps/train_tri2b.sh
|
||||
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2b/tree exp/tri2b/final.mdl exp/graph_tri2b_tg_pruned || exit 1;
|
||||
scripts/decode.sh exp/decode_tri2b_tgpr_utt_eval92 exp/graph_tri2b_tg_pruned/HCLG.fst steps/decode_tri2b.sh data/eval_nov92.scp
|
||||
scripts/decode.sh --per-spk exp/decode_tri2b_tgpr_eval92 exp/graph_tri2b_tg_pruned/HCLG.fst steps/decode_tri2b.sh data/eval_nov92.scp
|
||||
scripts/decode.sh exp/decode_tri2b_tgpr_utt_fmllr_eval92 exp/graph_tri2b_tg_pruned/HCLG.fst steps/decode_tri2b_fmllr.sh data/eval_nov92.scp
|
||||
scripts/decode.sh --per-spk exp/decode_tri2b_tgpr_fmllr_eval92 exp/graph_tri2b_tg_pruned/HCLG.fst steps/decode_tri2b_fmllr.sh data/eval_nov92.scp
|
||||
) &
|
||||
|
||||
# MLLT/STC
|
||||
steps/train_tri2d.sh
|
||||
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2d/tree exp/tri2d/final.mdl exp/graph_tri2d_tg_pruned || exit 1;
|
||||
scripts/decode.sh exp/decode_tri2d_tgpr_eval92 exp/graph_tri2d_tg_pruned/HCLG.fst steps/decode_tri2d.sh data/eval_nov92.scp )&
|
||||
|
||||
# Splice+LDA
|
||||
steps/train_tri2e.sh
|
||||
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2e/tree exp/tri2e/final.mdl exp/graph_tri2e_tg_pruned || exit 1;
|
||||
scripts/decode.sh exp/decode_tri2e_tgpr_eval92 exp/graph_tri2e_tg_pruned/HCLG.fst steps/decode_tri2e.sh data/eval_nov92.scp )&
|
||||
|
||||
# Splice+LDA+MLLT
|
||||
steps/train_tri2f.sh
|
||||
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2f/tree exp/tri2f/final.mdl exp/graph_tri2f_tg_pruned || exit 1;
|
||||
scripts/decode.sh exp/decode_tri2f_tgpr_eval92 exp/graph_tri2f_tg_pruned/HCLG.fst steps/decode_tri2f.sh data/eval_nov92.scp )&
|
||||
|
||||
# Linear VTLN (+ regular VTLN)
|
||||
steps/train_tri2g.sh
|
||||
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2g/tree exp/tri2g/final.mdl exp/graph_tri2g_tg_pruned || exit 1;
|
||||
scripts/decode.sh exp/decode_tri2g_tgpr_utt_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g.sh data/eval_nov92.scp
|
||||
scripts/decode.sh exp/decode_tri2g_tgpr_utt_diag_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g_diag.sh data/eval_nov92.scp
|
||||
scripts/decode.sh --wav exp/decode_tri2g_tgpr_utt_vtln_diag_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g_vtln_diag.sh data/eval_nov92.scp
|
||||
|
||||
scripts/decode.sh --per-spk exp/decode_tri2g_tgpr_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g.sh data/eval_nov92.scp
|
||||
scripts/decode.sh --per-spk exp/decode_tri2g_tgpr_diag_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g_diag.sh data/eval_nov92.scp
|
||||
scripts/decode.sh --wav --per-spk exp/decode_tri2g_tgpr_vtln_diag_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g_vtln_diag.sh data/eval_nov92.scp
|
||||
|
||||
)&
|
||||
|
||||
# Splice+HLDA
|
||||
steps/train_tri2h.sh
|
||||
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2h/tree exp/tri2h/final.mdl exp/graph_tri2h_tg_pruned || exit 1;
|
||||
scripts/decode.sh exp/decode_tri2h_tgpr_eval92 exp/graph_tri2h_tg_pruned/HCLG.fst steps/decode_tri2h.sh data/eval_nov92.scp )&
|
||||
|
||||
# Triple-deltas + HLDA
|
||||
steps/train_tri2i.sh
|
||||
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2i/tree exp/tri2i/final.mdl exp/graph_tri2i_tg_pruned || exit 1;
|
||||
scripts/decode.sh exp/decode_tri2i_tgpr_eval92 exp/graph_tri2i_tg_pruned/HCLG.fst steps/decode_tri2i.sh data/eval_nov92.scp )&
|
||||
|
||||
# Splice + HLDA
|
||||
steps/train_tri2j.sh
|
||||
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2j/tree exp/tri2j/final.mdl exp/graph_tri2j_tg_pruned || exit 1;
|
||||
scripts/decode.sh exp/decode_tri2j_tgpr_eval92 exp/graph_tri2j_tg_pruned/HCLG.fst steps/decode_tri2j.sh data/eval_nov92.scp )&
|
||||
|
||||
|
||||
# LDA+ET
|
||||
steps/train_tri2k.sh
|
||||
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2k/tree exp/tri2k/final.mdl exp/graph_tri2k_tg_pruned || exit 1;
|
||||
scripts/decode.sh exp/decode_tri2k_tgpr_utt_eval92 exp/graph_tri2k_tg_pruned/HCLG.fst steps/decode_tri2k.sh data/eval_nov92.scp
|
||||
scripts/decode.sh --per-spk exp/decode_tri2k_tgpr_eval92 exp/graph_tri2k_tg_pruned/HCLG.fst steps/decode_tri2k.sh data/eval_nov92.scp
|
||||
)&
|
||||
|
||||
# LDA+MLLT+SAT
|
||||
steps/train_tri2l.sh
|
||||
(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2l/tree exp/tri2l/final.mdl exp/graph_tri2l_tg_pruned || exit 1;
|
||||
scripts/decode.sh exp/decode_tri2l_tgpr_utt_eval92 exp/graph_tri2l_tg_pruned/HCLG.fst steps/decode_tri2l.sh data/eval_nov92.scp
|
||||
scripts/decode.sh --per-spk exp/decode_tri2l_tgpr_eval92 exp/graph_tri2l_tg_pruned/HCLG.fst steps/decode_tri2l.sh data/eval_nov92.scp
|
||||
)&
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# Note on WERs at different stages of decoding:
|
||||
#exp/decode_mono_tg_pruned/wer:%WER 31.82 [ 1795 / 5641, 109 ins, 412 del, 1274 sub ]
|
||||
#exp/decode_tri1_tg_pruned/wer:%WER 13.61 [ 768 / 5641, 134 ins, 76 del, 558 sub ]
|
||||
#exp/decode_tri2a_tg_pruned/wer:%WER 12.94 [ 730 / 5641, 131 ins, 62 del, 537 sub ]
|
||||
#exp/decode_tri3a_tg_pruned/wer:%WER 10.88 [ 614 / 5641, 126 ins, 47 del, 441 sub ]
|
||||
|
||||
|
||||
# For an e.g. of scoring with sclite: do e.g.
|
||||
# scripts/score_sclite.sh exp/decode_tri2a_tg_pruned
|
|
@ -0,0 +1,58 @@
|
|||
#!/usr/bin/perl
|
||||
# Copyright 2010-2011 Microsoft Corporation
|
||||
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
|
||||
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
|
||||
# MERCHANTABLITY OR NON-INFRINGEMENT.
|
||||
# See the Apache 2 License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
# Adds some specified number of disambig symbols to a symbol table.
|
||||
# Adds these as #1, #2, etc.
|
||||
# If the --include-zero option is specified, includes an extra one
|
||||
# #0.
|
||||
|
||||
$include_zero = 0;
|
||||
if($ARGV[0] eq "--include-zero") {
|
||||
$include_zero = 1;
|
||||
shift @ARGV;
|
||||
}
|
||||
|
||||
if(@ARGV != 2) {
|
||||
die "Usage: add_disambig.pl [--include-zero] symtab.txt num_extra > symtab_out.txt ";
|
||||
}
|
||||
|
||||
|
||||
$input = $ARGV[0];
|
||||
$nsyms = $ARGV[1];
|
||||
|
||||
open(F, "<$input") || die "Opening file $input";
|
||||
|
||||
while(<F>) {
|
||||
@A = split(" ", $_);
|
||||
@A == 2 || die "Bad line $_";
|
||||
$lastsym = $A[1];
|
||||
print;
|
||||
}
|
||||
|
||||
if(!defined($lastsym)){
|
||||
die "Empty symbol file?";
|
||||
}
|
||||
|
||||
if($include_zero) {
|
||||
$lastsym++;
|
||||
print "#0 $lastsym\n";
|
||||
}
|
||||
|
||||
for($n = 1; $n <= $nsyms; $n++) {
|
||||
$y = $n + $lastsym;
|
||||
print "#$n $y\n";
|
||||
}
|
Некоторые файлы не были показаны из-за слишком большого количества измененных файлов Показать больше
Загрузка…
Ссылка в новой задаче