Committing initial version of Kaldi

git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@2 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2011-05-14 21:48:08 +00:00 · 2011-05-14 21:48:08 +00:00 · 10e9002c88
--- a/270
+++ b/270
@ -0,0 +1,270 @@
+
+                          Legal Notices
+
+Each of the files comprising Kaldi v1.0 have been separately licensed by
+their respective author(s) under the terms of the Apache License v 2.0 (set
+forth below).  The source code headers for each file specifies the individual
+authors and source material for that file as well the corresponding copyright
+notice.  For reference purposes only: A cumulative list of all individual
+contributors and original source material as well as the full text of the Apache
+License v 2.0 are set forth below.
+
+Individual Contributors (in alphabetical order)
+      
+      Mohit Agarwal      
+      Gilles Boulianne
+      Lukas Burget
+      Ondrej Glembek
+      Arnab Ghoshal
+      Go Vivace Inc.
+      Mirko Hannemann
+      Microsoft Corporation
+      Petr Motlicek
+      Ariya Rastrow
+      Petr Schwarz      
+      Georg Stemmer
+      Jan Silovsky
+      Phonexia s.r.o.
+      Yanmin Qian
+      Karel Vesely
+      Haihua Xu
+      
+Other Source Material
+
+    This project includes a port and modification of materials from JAMA: A Java
+  Matrix Package under the following notice: "This software is a cooperative
+  product of The MathWorks and the National Institute of Standards and Technology
+  (NIST) which has been released to the public domain." This notice and the
+  original code is available at http://math.nist.gov/javanumerics/jama/
+
+   This project includes a modified version of code published in Malvar, H.,
+  "Signal processing with lapped transforms," Artech House, Inc., 1992.  The
+  current copyright holder, Henrique S. Malvar, has given his permission for the
+  release of this modified version under the Apache License 2.0.
+  
+  This file includes material from the OpenFST Library v1.2.7 available at 
+  http://www.openfst.org/twiki/bin/view/FST/WebHome and released under the 
+  Apache License v. 2.0.   
+
+  [OpenFst COPYING file begins here]
+
+    Licensed under the Apache License, Version 2.0 (the "License");
+    you may not use these files except in compliance with the License.
+    You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+ 
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+
+    Copyright 2005-2010 Google, Inc.
+
+  [OpenFst COPYING file ends here]
+
+
+ -------------------------------------------------------------------------
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/9
+++ b/9
@ -0,0 +1,9 @@
+
+[for native Windows install, see windows/INSTALL]
+
+(1)
+go to tools/  and follow INSTALL instructions there.
+
+(2) 
+go to src/ and follow INSTALL instructions there.
+
--- a/README.txt
+++ b/README.txt
@ -0,0 +1,31 @@
+
+This README has been created for those with whom we share the
+"pre-release" version of Kaldi.  Although the toolkit has not
+been "officially" released, I have been given the OK to share
+it privately for "non-commercial purposes" (whatever that means).
+The official release is scheduled for mid-March.
+
+The current version is not as polished as we would like, and contains
+some files that should eventually be deleted.  
+
+See http://merlin.fit.vutbr.cz/kaldi/ for documentation 
+(may not always be fully up to date).  This documentation
+is generated by running "doxygen" from the src/ directory,
+and appears in src/html/
+
+I assume that the reader would like to (1) build the toolkit
+and (2) run the example system builds.
+
+To build the toolkit: see ./INSTALL.  These instructions are valid for UNIX
+systems including various flavors of Linux; Darwin; and Cygwin (has not been
+tested on more "exotic" varieties of UNIX).  For Windows installation
+instructions (excluding Cygwin), see windows/INSTALL.
+
+To run the example system builds, see egs/README.txt
+
+If you encounter problems (and you probably will), your first point of contact
+should be Dan Povey (dpovey@microsoft.com).  In addition to specific questions,
+please let me know if there are specific aspects of the project that you feel
+could be improved, that you find confusing, etc., and which missing features you
+most wish it had.
+
--- a/egs/README.txt
+++ b/egs/README.txt
@ -0,0 +1,21 @@
+
+This directory contains example scripts that demonstrate how to 
+use Kaldi.  Each subdirectory corresponds to a corpus that we have
+example scripts for.  Currently these are both corpora available from
+the Linguistic Data Consortium (LDC).
+
+Explanations of the corpora are below:
+
+ wsj: The Wall Street Journal corpus.  This is a corpus of read
+    sentences from the Wall Street Journal, recorded under clean conditions.
+    The vocabulary is quite large. 
+    Available from the LDC as either: [ catalog numbers LDC93S6A (WSJ0) and LDC94S13A (WSJ1) ]
+    or: [ catalog numbers LDC93S6B (WSJ0) and LDC94S13B (WSJ1) ]
+    The latter option is cheaper and includes only the Sennheiser
+    microphone data (which is all we use in the example scripts).
+
+ rm: Resource Management.  Clean speech in a medium-vocabulary task consisting
+    of commands to a (presumably imaginary) computer system.
+    Available from the LDC as catalog number LDC93S3A (it may be possible to
+    get the same data using combinations of other catalog numbers, but this
+    is the one we used).
--- a/egs/rm/README.txt
+++ b/egs/rm/README.txt
@ -0,0 +1,8 @@
+
+Each subdirectory of this directory contains the
+scripts for a sequence of experiments.
+
+  s1: This setup is experiments with GMM-based systems with various 
+      Maximum Likelihood 
+      techniques including global and speaker-specific transforms.
+      See a parallel setup in ../wsj/s1 
--- a/egs/rm/s1/NOTES
+++ b/egs/rm/s1/NOTES
@ -0,0 +1,11 @@
+
+Note RE decoding beams:
+
+WER
+    Beam     20    25    30
+ monophone  18.28        28.24
+  triphone  6.767  6.724 6.724   [tri1]
+Time [on svatava, xRT]
+  triphone  0.13  0.27   0.43    [tri1]
+
+
--- a/egs/rm/s1/conf/mfcc.conf
+++ b/egs/rm/s1/conf/mfcc.conf
@ -0,0 +1 @@
+--use-energy=false   # only non-default option.
--- a/egs/rm/s1/conf/topo.proto
+++ b/egs/rm/s1/conf/topo.proto
@ -0,0 +1,22 @@
+<Topology> 
+<TopologyEntry> 
+<ForPhones>
+NONSILENCEPHONES
+</ForPhones> 
+<State> 0 <PdfClass> 0 <Transition> 0 0.75 <Transition> 1 0.25 </State> 
+<State> 1 <PdfClass> 1 <Transition> 1 0.75 <Transition> 2 0.25 </State> 
+<State> 2 <PdfClass> 2 <Transition> 2 0.75 <Transition> 3 0.25 </State> 
+<State> 3 </State>
+</TopologyEntry> 
+<TopologyEntry> 
+<ForPhones>
+SILENCEPHONES
+</ForPhones> 
+<State> 0 <PdfClass> 0 <Transition> 0 0.25 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 </State> 
+<State> 1 <PdfClass> 1 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State> 
+<State> 2 <PdfClass> 2 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State> 
+<State> 3 <PdfClass> 3 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State> 
+<State> 4 <PdfClass> 4 <Transition> 4 0.25 <Transition> 5 0.75 </State> 
+<State> 5 </State>
+</TopologyEntry> 
+</Topology> 
--- a/egs/rm/s1/data_prep/make_trans.pl
+++ b/egs/rm/s1/data_prep/make_trans.pl
@ -0,0 +1,69 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# usage:  make_trans.sh prefix in.flist input.snr out.txt out.scp
+
+# prefix is first letters of the database "key" (rest are numeric)
+
+# in.flist is just a list of filenames, probably of .sph files.
+# input.snr is an snr format file from the RM dataset.  
+# out.txt is the output transcriptions in format "key word1 word\n"
+# out.scp is the output scp file, which is as in.scp but has the
+# database-key first on each line.
+
+# Reads from first argument e.g. $rootdir/rm1_audio1/rm1/doc/al_sents.snr
+# and second argument train_wav.scp 
+# Writes to standard output trans.txt
+
+if(@ARGV != 5) {
+    die "usage:  make_trans.sh prefix in.flist input.snr out.txt out.scp\n";
+}
+($prefix, $in_flist, $input_snr, $out_txt, $out_scp) = @ARGV;
+
+open(F, "<$input_snr") || die "Opening SNOR file $input_snr";
+
+while(<F>) {
+    if(m/^;/) { next; }
+    m/(.+) \((.+)\)/ || die "bad line $_";
+    $T{$2} = $1;
+}
+
+close(F);
+open(G, "<$in_flist") || die "Opening file list $in_flist";
+
+open(O, ">$out_txt") || die "Open output transcription file $out_txt";
+
+open(P, ">$out_scp") || die "Open output scp file $out_scp";
+
+while(<G>) {
+    $_ =~ m:/(\w+)/(\w+)\.sph\s+$:i || die "bad scp line $_";
+    $spkname = $1;
+    $uttname = $2;
+    $uttname  =~ tr/a-z/A-Z/;
+    defined $T{$uttname} || die "no trans for sent $uttname";
+    $spkname =~ s/_//g; # remove underscore from spk name to make key nicer.
+    $key = $prefix . "_" . $spkname . "_" . $uttname;
+    $key =~ tr/A-Z/a-z/; # Make it all lower case.
+     # to make the numerical and string-sorted orders the same.
+    print O "$key $T{$uttname}\n";
+    print P "$key $_";
+    $n++;
+} 
+close(O) || die "Closing output.";
+close(P) || die "Closing output.";
+
+
--- a/egs/rm/s1/data_prep/run.sh
+++ b/egs/rm/s1/data_prep/run.sh
@ -0,0 +1,92 @@
+# This script should be run from the directory where it is located (i.e. data_prep)
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# The input is the 3 CDs from the LDC distribution of Resource Management.
+# The script's argument is a directory which has three subdirectories:
+# rm1_audio1  rm1_audio2  rm2_audio
+
+if [ $# != 1 ]; then
+   echo "Usage: ./run.sh /path/to/RM"
+   exit 1; 
+fi 
+
+RMROOT=$1
+if [ ! -d $RMROOT/rm1_audio1 -o ! -d $RMROOT/rm1_audio2 ]; then
+  echo "Error: run.sh requires a directory argument that contains rm1_audio1 and rm1_audio2"
+  exit 1; 
+fi  
+
+if [ ! -d $RMROOT/rm2_audio ]; then
+  echo "**Warning: $RMROOT/rm2_audio does not exist; won't create spk2gender.map file correctly***"
+  sleep 1
+fi  
+
+(
+  find $RMROOT/rm1_audio1/rm1/ind_trn -iname '*.sph';
+  find $RMROOT/rm1_audio2/2_4_2/rm1/ind/dev_aug -iname '*.sph';
+) | perl -ane ' m:/sa\d.sph:i || m:/sb\d\d.sph:i || print; '  > train_sph.flist
+
+
+
+# make_trans.pl also creates the utterance id's and the kaldi-format scp file.
+./make_trans.pl trn train_sph.flist $RMROOT/rm1_audio1/rm1/doc/al_sents.snr train_trans.txt train_sph.scp
+mv train_trans.txt tmp; sort -k 1 tmp > train_trans.txt
+mv train_sph.scp tmp; sort -k 1 tmp > train_sph.scp
+
+sph2pipe=`cd ../../../..; echo $PWD/tools/sph2pipe_v2.5/sph2pipe`
+if [ ! -f $sph2pipe ]; then
+   echo "Could not find the sph2pipe program at $sph2pipe";
+   exit 1;
+fi
+awk '{printf("%s '$sph2pipe' -f wav %s |\n", $1, $2);}' < train_sph.scp > train_wav.scp
+
+cat train_wav.scp | perl -ane 'm/^(\w+_(\w+)\w_\w+) / || die; print "$1 $2\n"' > train.utt2spk
+cat train.utt2spk | sort -k 2 | ../scripts/utt2spk_to_spk2utt.pl > train.spk2utt
+
+
+for ntest in 1_mar87 2_oct87 4_feb89 5_oct89 6_feb91 7_sep92; do
+   n=`echo $ntest | cut -d_ -f 1`
+   test=`echo $ntest | cut -d_ -f 2`
+   root=$RMROOT/rm1_audio2/2_4_2
+   for x in `grep -v ';' $root/rm1/doc/tests/$ntest/${n}_indtst.ndx`; do
+      echo "$root/$x ";
+  done > test_${test}_sph.flist
+done
+
+# make_trans.pl also creates the utterance id's and the kaldi-format scp file.
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+  ./make_trans.pl ${test} test_${test}_sph.flist $RMROOT/rm1_audio1/rm1/doc/al_sents.snr test_${test}_trans.txt test_${test}_sph.scp
+   mv test_${test}_trans.txt tmp; sort -k 1 tmp > test_${test}_trans.txt
+   mv test_${test}_sph.scp tmp; sort -k 1 tmp > test_${test}_sph.scp
+
+  awk '{printf("%s '$sph2pipe' -f wav %s |\n", $1, $2);}' < test_${test}_sph.scp  > test_${test}_wav.scp
+
+  cat test_${test}_wav.scp | perl -ane 'm/^(\w+_(\w+)\w_\w+) / || die; print "$1 $2\n"' > test_${test}.utt2spk
+  cat test_${test}.utt2spk | sort -k 2 | ../scripts/utt2spk_to_spk2utt.pl > test_${test}.spk2utt
+done
+
+cat $RMROOT/rm1_audio2/2_5_1/rm1/doc/al_spkrs.txt \
+ $RMROOT/rm2_audio/3-1.2/rm2/doc/al_spkrs.txt | \
+ perl -ane 'tr/A-Z/a-z/;print;' | grep -v ';' | \
+     awk '{print $1, $2}' > spk2gender.map
+
+../scripts/make_rm_lm.pl $RMROOT/rm1_audio1/rm1/doc/wp_gram.txt  > G.txt 
+
+# Getting lexicon
+../scripts/make_rm_dict.pl  $RMROOT/rm1_audio2/2_4_2/score/src/rdev/pcdsril.txt > lexicon.txt
+
+echo Succeeded.
--- a/egs/rm/s1/data_prep/sph2wav.sh
+++ b/egs/rm/s1/data_prep/sph2wav.sh
@ -0,0 +1,39 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+fake=false
+if [ "$1" == "--fake" ]; then
+    fake=true
+    shift
+fi
+
+sphdir=$1 # e.g. /mnt/matylda2/data/RM
+wavdir=$2 # e.g. /mnt/matylda6/jhu09/qpovey/kaldi_rm_wav
+flistin=$3 # e.g. train_sph.flist, contains sph files in sphdir
+flistout=$4 # e.g. train_wav.flist, contains wav files in wavdir
+
+
+if [ $fake == false ]; then
+    for x in `cat $flistin`; do 
+        y=`echo $x | sed s:$sphdir:$wavdir: | sed s:.sph:.wav:`;
+        mkdir -p `dirname $y`
+        ../../tools/sph2pipe_v2.5/sph2pipe -f wav $x $y || exit 1;
+    done 
+fi
+
+cat $flistin | sed s:$sphdir:$wavdir: | sed s:.sph:.wav: > $flistout || exit 1;
+
--- a/egs/rm/s1/path.sh
+++ b/egs/rm/s1/path.sh
@ -0,0 +1 @@
+export PATH=$PATH:../../../src/bin:../../../tools/openfst/bin:../../../src/fstbin/:../../../src/gmmbin/:../../../src/featbin/:../../../src/fgmmbin:../../../src/sgmmbin
--- a/egs/rm/s1/run.sh
+++ b/egs/rm/s1/run.sh
@ -0,0 +1,96 @@
+#!/bin/bash
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+exit 1 # Don't run this... it's to be run line by line from the shell.
+
+# This script file cannot be run as-is; some paths in it need to be changed
+# before you can run it.
+# Search for /path/to.
+# It is recommended that you do not invoke this file from the shell, but
+# run the paths one by one, by hand.
+
+# the step in data_prep/ will need to be modified for your system.
+
+# First step is to do data preparation:
+# This just creates some text files, it is fast.
+# If not on the BUT system, you would have to change run.sh to reflect
+# your own paths.
+#
+
+#Example arguments to run.sh: /mnt/matylda2/data/RM, /ais/gobi2/speech/RM, /cygdrive/e/data/RM
+# RM is a directory with subdirectories rm1_audio1, rm1_audio2, rm2_audio
+cd data_prep
+#*** You have to change the pathname below.***
+./run.sh /path/to/RM
+cd ..
+
+
+mkdir -p data
+( cd data; cp ../data_prep/{train,test*}.{spk2utt,utt2spk} . ; cp ../data_prep/spk2gender.map . )
+
+# This next step converts the lexicon, grammar, etc., into FST format.
+steps/prepare_graphs.sh
+
+
+# Next, make sure that "exp/" is someplace you can write a significant amount of
+# data to (e.g. make it a link to a file on some reasonably large file system).
+# If it doesn't exist, the scripts below will make the directory "exp".
+
+# tempdir should be set to some place to put training mfcc's
+# where you have space.
+#e.g.: tempdir=/mnt/matylda6/jhu09/qpovey/kaldi_rm_mfccb
+mfccdir=/path/to/mfccdir
+steps/make_mfcc_train.sh $mfccdir
+steps/make_mfcc_test.sh $mfccdir
+
+steps/train_mono.sh    
+steps/decode_mono.sh  &
+steps/train_tri1.sh
+steps/decode_tri1.sh  &
+
+steps/train_tri2a.sh
+steps/decode_tri2a.sh  &
+
+# Then do the same for 2b, 2c, and so on
+# 2a = basic triphone (all features double-deltas unless stated).
+# 2b = exponential transform
+# 2c = mean normalization (cmn)
+# 2d = MLLT
+# 2e = splice-9-frames + LDA
+# 2f = splice-9-frames + LDA + MLLT
+# 2g = linear VTLN (+ regular VTLN); various decode scripts available.
+# 2h = splice-9-frames + HLDA
+# 2i = triple-deltas + HLDA
+# 2j = triple-deltas + LDA + MLLT
+# 2k = LDA + ET (equiv to LDA+MLLT+ET)
+
+
+# To train and test SGMM systems:
+
+steps/train_ubma.sh
+
+# train and test unadapted system
+steps/train_sgmma.sh
+steps/decode_sgmma.sh
+
+# train and test system with speaker vectors.
+steps/train_sgmmb.sh
+steps/decode_sgmmb.sh
+
+
+
+
--- a/egs/rm/s1/scripts/add_disambig.pl
+++ b/egs/rm/s1/scripts/add_disambig.pl
@ -0,0 +1,58 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# Adds some specified number of disambig symbols to a symbol table.
+# Adds these as #1, #2, etc.
+# If the --include-zero option is specified, includes an extra one
+# #0.
+if(!(@ARGV == 2 || (@ARGV ==3 && $ARGV[0] eq "--include-zero"))) {
+    die "Usage: add_disambig.pl [--include-zero] symtab.txt num_extra > symtab_out.txt ";
+}
+
+if(@ARGV  == 3) {
+    $include_zero = 1;
+    $ARGV[0] eq "--include-zero" || die "Bad option/first argument $ARGV[0]";
+    shift @ARGV;
+} else {
+    $include_zero = 0;
+}
+
+$input = $ARGV[0];
+$nsyms = $ARGV[1];
+
+open(F, "<$input") || die "Opening file $input";
+
+while(<F>) {
+    @A = split(" ", $_);
+    @A == 2 || die "Bad line $_";
+    $lastsym = $A[1];
+    print;
+}
+
+if(!defined($lastsym)){
+ die "Empty symbol file?";
+}
+
+if($include_zero) {
+    $lastsym++;
+    print "#0  $lastsym\n";
+}
+
+for($n = 1; $n <= $nsyms; $n++) {
+    $y = $n + $lastsym;
+    print "#$n  $y\n";
+}
--- a/egs/rm/s1/scripts/add_lex_disambig.pl
+++ b/egs/rm/s1/scripts/add_lex_disambig.pl
@ -0,0 +1,101 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# Adds disambiguation symbols to a lexicon.
+# Outputs still in the normal lexicon format.
+# Disambig syms are numbered #1, #2, #3, etc. (#0 
+# reserved for symbol in grammar).
+# Outputs the number of disambig syms to the standard output.
+
+if(@ARGV != 2) {
+    die "Usage: add_lex_disambig.pl  lexicon.txt lexicon_disambig.txt "
+}
+
+
+$lexfn = shift @ARGV;
+$lexoutfn = shift @ARGV;
+
+open(L, "<$lexfn") || die "Error opening lexicon $lexfn";
+
+# (1)  Read in the lexicon.
+@L = ( );
+while(<L>) {
+    @A = split(" ", $_);
+    push @L, join(" ", @A);
+}
+
+# (2) Work out the count of each phone-sequence in the
+# lexicon.
+
+foreach $l (@L) {
+    @A = split(" ", $l);
+    shift @A; # Remove word.
+    $count{join(" ",@A)}++;
+}
+
+# (3) For each left sub-sequence of each phone-sequence, note down
+# that exists (for identifying prefixes of longer strings).
+
+foreach $l (@L) {
+    @A = split(" ", $l);
+    shift @A; # Remove word.
+    while(@A > 0) {
+        pop @A;  # Remove last phone
+        $issubseq{join(" ",@A)} = 1;
+    }
+}
+
+# (4) For each entry in the lexicon:
+#  if the phone sequence is unique and is not a
+#  prefix of another word, no diambig symbol.
+#  Else output #1, or #2, #3, ... if the same phone-seq
+#  has already been assigned a disambig symbol.
+
+
+open(O, ">$lexoutfn") || die "Opening lexicon file $lexoutfn for writing.\n";
+
+$max_disambig = 0;
+foreach $l (@L) {
+    @A = split(" ", $l);
+    $word = shift @A;
+    $phnseq = join(" ",@A);
+    if(!defined $issubseq{$phnseq}
+       && $count{$phnseq}==1) {
+        ; # Do nothing.
+    } else {
+        if($phnseq eq "") { # need disambig symbols for the empty string
+            # that are not use anywhere else.
+            $max_disambig++;
+            $reserved{$max_disambig} = 1;
+            $phnseq = "#$max_disambig";
+        } else {
+            $curnumber = $disambig_of{$phnseq};
+            if(!defined{$curnumber}) { $curnumber = 0; }
+            $curnumber++; # now 1 or 2, ... 
+            while(defined $reserved{$curnumber} ) { $curnumber++; } # skip over reserved symbols
+            if($curnumber > $max_disambig) {
+                $max_disambig = $curnumber;
+            }
+            $disambig_of{$phnseq} = $curnumber;
+            $phnseq = $phnseq . " #" . $curnumber;
+         }
+    }
+    print O "$word\t$phnseq\n";
+}
+
+print $max_disambig . "\n";
+
--- a/egs/rm/s1/scripts/filter_scp.pl
+++ b/egs/rm/s1/scripts/filter_scp.pl
@ -0,0 +1,40 @@
+#!/usr/bin/perl -w
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# This script takes a list of utterance-ids and filters an scp
+# file (or any file whose first field is an utterance id), printing
+# out only those lines whose first field is in id_list.
+
+if(@ARGV < 1 || @ARGV > 2) {
+    die "Usage: filter_scp.pl id_list [in.scp] > out.scp ";
+}
+
+$idlist = shift @ARGV;
+open(F, "<$idlist") || die "Could not open id-list file $idlist";
+while(<F>) {
+    @A = split;
+    @A>=1 || die "Invalid id-list file line $_";
+    $seen{$A[0]} = 1;
+}
+
+while(<>) {
+    @A = split;
+    @A > 0 || die "Invalid scp file line $_";
+    if($seen{$A[0]}) {
+        print $_;
+    }
+}
--- a/egs/rm/s1/scripts/int2sym.pl
+++ b/egs/rm/s1/scripts/int2sym.pl
@ -0,0 +1,69 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+$ignore_noninteger = 0;
+$ignore_first_field = 0;
+for($x = 0; $x < 2; $x++) {
+    if($ARGV[0] eq "--ignore-noninteger") { $ignore_oov = 1; shift @ARGV; }
+    if($ARGV[0] eq "--ignore-first-field") { $ignore_first_field = 1; shift @ARGV; }
+}
+
+$symtab = shift @ARGV;
+if(!defined $symtab) {
+    die "Usage: sym2int.pl symtab [input transcriptions] > output transcriptions\n";
+}
+open(F, "<$symtab") || die "Error opening symbol table file $symtab";
+while(<F>) {
+    @A = split(" ", $_);
+    @A == 2 || die "bad line in symbol table file: $_";
+    $int2sym{$A[1]} = $A[0];
+}
+
+$error = 0;
+while(<>) {
+    @A = split(" ", $_);
+    if(@A == 0) {
+        die "Empty line in transcriptions input.";
+    }
+    if($ignore_first_field) {
+        $key = shift @A;
+        print $key . " ";
+    }
+    foreach $a (@A) {
+        if($a !~  m:^\d+$:) { # not all digits..
+            if($ignore_noninteger) {
+                print $a . " ";
+                next;
+            } else {
+                if($a eq $A[0]) {
+                    die "int2sym.pl: found noninteger token $a (try --ignore-first-field)\n";
+                } else {
+                    die "int2sym.pl: found noninteger token $a (try --ignore-noninteger if valid input)\n";
+                }
+            }
+        }
+        $s = $int2sym{$a};
+        if(!defined ($s)) {
+            die "int2sym.pl: integer $a not in symbol table $symtab.";
+        }
+        print $s . " ";
+    }
+    print "\n";
+}
+
+
+
--- a/egs/rm/s1/scripts/is_sorted.sh
+++ b/egs/rm/s1/scripts/is_sorted.sh
@ -0,0 +1,45 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# Usage: is_sorted.sh [script-file]
+# This script returns 0 (success) if the script file argument [or standard input]
+# is sorted and 1 otherwise.
+
+export LC_ALL=C
+
+if [ $# == 0 ]; then
+  scp=-
+fi
+if [ $# == 1 ]; then
+  scp=$1
+fi
+if [ $# -gt 1 -o "$1" == "--help" -o "$1" == "-h" ]; then
+  echo "Usage: is_sorted.sh [script-file]"
+  exit 1
+fi
+
+cat $scp > /tmp/tmp1.$$
+sort /tmp/tmp1.$$ > /tmp/tmp2.$$
+cmp /tmp/tmp1.$$ /tmp/tmp2.$$ >/dev/null
+ret=$?
+rm /tmp/tmp1.$$  /tmp/tmp2.$$
+if [ $ret == 0 ]; then
+   exit 0;
+else
+  echo "is_sorted.sh: script file $scp is not sorted";
+  exit 1;
+fi
--- a/egs/rm/s1/scripts/make_lexicon_fst.pl
+++ b/egs/rm/s1/scripts/make_lexicon_fst.pl
@ -0,0 +1,112 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# makes lexicon FST (no pron-probs involved).
+
+if(@ARGV != 1 && @ARGV != 3) {
+    die "Usage: make_lexicon_fst.pl lexicon.txt [silprob silphone] > lexiconfst.txt"
+}
+
+$lexfn = shift @ARGV;
+if(@ARGV == 0) {
+    $silprob = 0.0;
+} else { 
+    ($silprob,$silphone) = @ARGV;
+}
+if($silprob != 0.0) {
+    $silprob < 1.0 || die "Sil prob cannot be >= 1.0";
+    $silcost = -log($silprob);
+    $nosilcost = -log(1.0 - $silprob);
+}
+
+
+open(L, "<$lexfn") || die "Error opening lexicon $lexfn";
+
+
+
+if( $silprob == 0.0 ) { # No optional silences: just have one (loop+final) state which is numbered zero.
+    $loopstate = 0;
+    $nexststate = 1; # next unallocated state.
+    while(<L>) {
+        @A = split(" ", $_);
+        $w = shift @A;
+        if(@A == 0) { # For empty words (<s> and </s>) insert no optional
+                      # silence (not needed as adjacent words supply it)....
+                      # actually we only hit this case for the lexicon without disambig
+                      # symbols but doesn't ever matter as training transcripts don't have <s> or </s>.
+            print "$loopstate\t$loopstate\t<eps>\t$w\n";
+        } else {
+            $s = $loopstate;
+            $word_or_eps = $w;
+            while (@A > 0) {
+                $p = shift @A;
+                if(@A > 0) {
+                    $ns = $nextstate++;
+                } else {
+                    $ns = $loopstate;
+                }
+                print "$s\t$ns\t$p\t$word_or_eps\n";
+                $word_or_eps = "<eps>";
+                $s = $ns;
+            }            
+        }
+    }
+    print "$loopstate\t0\n"; # final-cost.
+} else { # have silence probs.
+    $startstate = 0;
+    $loopstate = 1;
+    $silstate = 2; # state from where we go to loopstate after emitting silence.
+    $nextstate = 3;
+    print "$startstate\t$loopstate\t<eps>\t<eps>\t$nosilcost\n"; # no silence.
+    print "$startstate\t$loopstate\t$silphone\t<eps>\t$silcost\n"; # silence.
+    print "$silstate\t$loopstate\t$silphone\t<eps>\n"; # no cost.
+    while(<L>) {
+        @A = split(" ", $_);
+        $w = shift @A;
+        if(@A == 0) { # For empty words (<s> and </s>) insert no optional
+                      # silence (not needed as adjacent words supply it)....
+                      # actually we only hit this case for the lexicon without disambig
+                      # symbols but doesn't ever matter as training transcripts don't have <s> or </s>.
+            print "$loopstate\t$loopstate\t<eps>\t$w\n";
+        } else { 
+            $is_silence_word = (@A == 1 && $A[0] eq $silphone); # boolean.
+            $s = $loopstate;
+            $word_or_eps = $w;
+            while (@A > 0) {
+                $p = shift @A;
+                if(@A > 0) {
+                    $ns = $nextstate++;
+                    print "$s\t$ns\t$p\t$word_or_eps\n";
+                    $word_or_eps = "<eps>";
+                    $s = $ns;
+                } else {
+                    if(! $is_silence_word) {  
+                        # This is non-deterministic but relatively compact,
+                        # and avoids epsilons.
+                        print "$s\t$loopstate\t$p\t$word_or_eps\t$nosilcost\n";
+                        print "$s\t$silstate\t$p\t$word_or_eps\t$silcost\n";
+                    } else {
+                        # no point putting opt-sil after silence word.
+                        print "$s\t$loopstate\t$p\t$word_or_eps\n";
+                    }
+                    $word_or_eps = "<eps>";
+                }
+            }
+        }            
+    }
+    print "$loopstate\t0\n"; # final-cost.
+}
--- a/egs/rm/s1/scripts/make_phones_symtab.pl
+++ b/egs/rm/s1/scripts/make_phones_symtab.pl
@ -0,0 +1,37 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# make_phones_symtab.pl < lexicon.txt > phones.txt
+
+
+while(<>) {
+    @A = split(" ", $_);
+    for ($i=2; $i<@A; $i++) {
+        $P{$A[$i]} = 1; # seen it.
+    }
+}
+
+print "<eps>\t0\n";
+$n = 1;
+foreach $p (sort keys %P) {
+    if($p ne "<eps>") {
+        print "$p\t$n\n";
+        $n++;
+    }
+}
+
+print "sil\t$n\n";
+
--- a/egs/rm/s1/scripts/make_rm_dict.pl
+++ b/egs/rm/s1/scripts/make_rm_dict.pl
@ -0,0 +1,130 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Yanmin Qian  Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# This file takes as input the file pcdsril.txt that comes with the RM
+# distribution, and creates the dictionary used in RM training.
+
+# make_rm_dct.pl   pcdsril.txt > dct.txt
+
+if (@ARGV != 1) {
+    die "usage: make_rm_dct.pl   pcdsril.txt > dct.txt\n";
+}
+unless (open(IN_FILE, "@ARGV[0]")) {
+    die ("can't open @ARGV[0]");
+}
+
+while ($line = <IN_FILE>)
+{	
+	chop($line);
+	if (($line =~ /^[a-z]/)) 
+	{
+		$line =~ s/\+1//g;
+		@LineArray = split(/\s+/,$line);
+		@LineArray[0] = uc(@LineArray[0]);
+
+		printf "%-16s",  @LineArray[0];
+		for ($i = 1; $i < @LineArray; $i ++)
+		{
+			if (@LineArray[$i] eq 'q')
+			{}
+			elsif (@LineArray[$i] eq 'zh')
+			{
+				printf "sh ";
+			}
+			elsif (@LineArray[$i] eq 'eng')
+			{
+				printf "ng ";
+			}
+			elsif (@LineArray[$i] eq 'hv')
+			{
+				printf "hh ";
+			}
+			elsif (@LineArray[$i] eq 'em')
+			{
+				printf "m ";
+			}
+			elsif (@LineArray[$i] eq 'axr')
+			{
+				printf "er ";
+			}
+			elsif (@LineArray[$i] eq 'tcl')
+			{
+				if (@LineArray[$i+1] ne 't')
+				{
+					printf "td ";
+				}
+			}
+			elsif (@LineArray[$i] eq 'dcl')
+			{
+				if (@LineArray[$i+1] ne 'd')
+				{
+					printf "dd ";
+				}
+			}
+			elsif (@LineArray[$i] eq 'kcl')
+			{
+				if (@LineArray[$i+1] ne 'k')
+				{
+					printf "kd ";
+				}
+			}
+			elsif (@LineArray[$i] eq 'pcl')
+			{
+				if (@LineArray[$i+1] ne 'p')
+				{
+					printf "pd ";
+				}
+			}
+			elsif (@LineArray[$i] eq 'bcl')
+			{
+				if (@LineArray[$i+1] ne 'b')
+				{
+					printf "b ";
+				}
+			}
+			elsif (@LineArray[$i] eq 'gcl')
+			{
+				if (@LineArray[$i+1] ne 'g')
+				{
+					printf "g ";
+				}
+			}
+			elsif (@LineArray[$i] eq 't')
+			{
+				if (@LineArray[$i+1] ne 's')
+				{
+					printf "@LineArray[$i] ";
+				}
+				else
+				{
+					printf "ts ";
+					$i++;
+				}
+			}
+			else
+			{
+				printf "@LineArray[$i] ";
+			}
+		}
+		printf "\n";
+	}
+}
+
+printf "!SIL  sil\n";
+
+close(IN_FILE);
+
+
--- a/egs/rm/s1/scripts/make_rm_lm.pl
+++ b/egs/rm/s1/scripts/make_rm_lm.pl
@ -0,0 +1,119 @@
+#!/usr/bin/perl
+
+# Copyright 2010-2011 Yanmin Qian  Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# This file takes as input the file wp_gram.txt that comes with the RM
+# distribution, and creates the language model as an acceptor in FST form.
+
+# make_rm_lm.pl   wp_gram.txt > G.txt
+
+if (@ARGV != 1) {
+    print "usage: make_rm_lm.pl  wp_gram.txt > G.txt\n";
+    exit(0);
+}
+unless (open(IN_FILE, "@ARGV[0]")) {
+    die ("can't open @ARGV[0]");
+}
+
+
+$flag = 0;
+$count_wrd = 0;
+$cnt_ends = 0;
+$init = "";
+
+while ($line = <IN_FILE>)
+{	
+	chop($line);
+
+    $line =~ s/ //g;
+    
+	if(($line =~ /^>/)) 
+	{
+		if($flag == 0) 
+		{
+			$flag = 1;
+		}
+		$line =~ s/>//g;
+		$hashcnt{$init} = $i;
+		$init = $line;
+		$i = 0;
+		$count_wrd++;
+		@LineArray[$count_wrd - 1] = $init;
+ 		$hashwrd{$init} = 0;
+	}
+	elsif($flag != 0)
+	{
+		
+		$hash{$init}[$i] = $line;
+		$i++; 			
+		if($line =~ /SENTENCE-END/)
+		{
+			$cnt_ends++;
+		}
+ 	} 
+	else
+	{}
+}
+
+$hashcnt{$init} = $i;
+
+$num = 0;
+$weight = 0;
+$init_wrd = "SENTENCE-END";
+$hashwrd{$init_wrd} = @LineArray;
+for($i = 0; $i < $hashcnt{$init_wrd}; $i++)
+{
+	$weight = -log(1/$hashcnt{$init_wrd});
+	$hashwrd{$hash{$init_wrd}[$i]} = $i + 1;
+	print "0    $hashwrd{$hash{$init_wrd}[$i]}    $hash{$init_wrd}[$i]    $hash{$init_wrd}[$i]    $weight\n";
+}
+$num = $i;
+
+for($i = 0; $i < @LineArray; $i++)
+{
+	if(@LineArray[$i] eq 'SENTENCE-END')
+	{}
+	else
+	{
+		if($hashwrd{@LineArray[$i]} == 0)
+		{
+			$num++;
+			$hashwrd{@LineArray[$i]} = $num;
+		}
+		for($j = 0; $j < $hashcnt{@LineArray[$i]}; $j++)
+		{
+			$weight = -log(1/$hashcnt{@LineArray[$i]});
+			if($hashwrd{$hash{@LineArray[$i]}[$j]} == 0)
+			{
+				$num++;
+				$hashwrd{$hash{@LineArray[$i]}[$j]} = $num;
+			}
+			if($hash{@LineArray[$i]}[$j] eq 'SENTENCE-END')
+			{
+				print "$hashwrd{@LineArray[$i]}    $hashwrd{$hash{@LineArray[$i]}[$j]}    <eps>    <eps>    $weight\n"
+                }
+			else
+			{
+				print "$hashwrd{@LineArray[$i]}    $hashwrd{$hash{@LineArray[$i]}[$j]}    $hash{@LineArray[$i]}[$j]    $hash{@LineArray[$i]}[$j]    $weight\n";
+			}
+		}
+	}
+}
+
+print "$hashwrd{$init_wrd}    0\n";
+close(IN_FILE);
+
+
--- a/egs/rm/s1/scripts/make_roots.pl
+++ b/egs/rm/s1/scripts/make_roots.pl
@ -0,0 +1,102 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# Written by Dan Povey 9/21/2010.  Apache 2.0 License.
+
+# This version of make_roots.pl is specialized for RM.
+
+# This script creates the file roots.txt which is an input to train-tree.cc.  It
+# specifies how the trees are built.  The input file phone-sets.txt is a partial
+# version of roots.txt in which phones are represented by their spelled form, not
+# their symbol id's.  E.g. at input, phone-sets.txt might contain;
+#  shared not-split  sil
+# Any phones not specified in phone-sets.txt but present in phones.txt will
+# be given a default treatment.  If the --separate option is given, we create
+# a separate tree root for each of them, otherwise they are all lumped in one set.
+# The arguments shared|not-shared and split|not-split are needed if any
+# phones are not specified in phone-sets.txt.  What they mean is as follows:
+# if shared=="shared" then we share the tree-root between different HMM-positions
+# (0,1,2).  If split=="split" then we actually do decision tree splitting on
+# that root, otherwise we forbid decision-tree splitting.  (The main reason we might 
+# set this to false is for silence when
+# we want to ensure that the HMM-positions will remain with a single PDF id.
+
+
+$separate = 0;
+if($ARGV[0] eq "--separate") {
+    $separate = 1;
+    shift @ARGV;
+}
+
+if(@ARGV != 4) {
+    die "Usage: make_roots.pl [--separate] phones.txt silence-phone-list[integer,colon-separated] shared|not-shared split|not-split > roots.txt\n";
+}
+
+
+($phonesfile, $silphones, $shared, $split) = @ARGV;
+if($shared ne "shared" && $shared ne "not-shared") {
+    die "Third argument must be \"shared\" or \"not-shared\"\n";
+}
+if($split ne "split" && $split ne "not-split") {
+    die "Third argument must be \"split\" or \"not-split\"\n";
+}
+
+
+
+open(F, "<$phonesfile") || die "Opening file $phonesfile";
+
+while(<F>) {
+    @A = split(" ", $_);
+    if(@A != 2) {
+        die "Bad line in phones symbol file: ".$_;
+    }
+    if($A[1] != 0) {
+        $symbol2id{$A[0]} = $A[1];
+        $id2symbol{$A[1]} = $A[0];
+    }
+}
+
+if($silphones == ""){ 
+    die "Empty silence phone list in make_roots.pl";
+}
+foreach $silphoneid (split(":", $silphones)) {
+    defined $id2symbol{$silphoneid} || die "No such silence phone id $silphoneid";
+    # Give each silence phone its own separate pdfs in each state, but
+    # no sharing (in this recipe; WSJ is different.. in this recipe there
+    #is only one silence phone anyway.)
+    $issil{$silphoneid} = 1;
+    print "not-shared not-split $silphoneid\n";
+}
+
+$idlist = "";
+$remaining_phones = "";
+
+if($separate){
+    foreach $a (keys %id2symbol) {
+        if(!defined $issil{$a}) {
+            print "$shared $split $a\n";
+        }
+    }
+} else {
+    print "$shared $split ";
+    foreach $a (keys %id2symbol) {
+        if(!defined $issil{$a}) {
+            print "$a ";
+        }
+    }
+    print "\n";
+}
--- a/egs/rm/s1/scripts/make_words_symtab.pl
+++ b/egs/rm/s1/scripts/make_words_symtab.pl
@ -0,0 +1,39 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# make_words_symtab.pl < G.txt > words.txt
+
+
+
+
+while(<>) {
+    @A = split(" ", $_);
+    if(@A >= 3) {
+        $W{$A[2]} = 1;
+    }
+}
+
+print "<eps>\t0\n";
+$n = 1;
+foreach $w (sort keys %W) {
+    if($w ne "<eps>") {
+        print "$w\t$n\n";
+        $n++;
+    }
+}
+
+print "!SIL\t$n\n";
+
--- a/egs/rm/s1/scripts/mkgraph.sh
+++ b/egs/rm/s1/scripts/mkgraph.sh
@ -0,0 +1,107 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+reorder=true # Dan-style, make false for Mirko+Lukas's decoder.
+
+for x in 1 2 3; do 
+  if [ $1 == "--mono" ]; then
+    monophone_opts="--context-size=1 --central-position=0"
+    shift;
+  fi
+
+  if [ $1 == "--noreorder" ]; then 
+    reorder=false # we set this for the Kaldi decoder.
+    shift;
+  fi
+done
+
+if [ $# != 3 ]; then
+   echo "Usage: scripts/mkgraph.sh  <tree> <model> <graphdir>"
+   exit 1;
+fi
+
+if [ -f path.sh ]; then . path.sh; fi
+
+tree=$1
+model=$2
+dir=$3
+
+mkdir -p $dir
+
+tscale=1.0
+loopscale=0.1
+
+fsttablecompose data/L_disambig.fst data/G.fst | fstdeterminizestar --use-log=true | \
+  fstminimizeencoded  > $dir/LG.fst
+
+fstisstochastic $dir/LG.fst || echo "warning: LG not stochastic."
+
+echo "Example string from LG.fst: "
+echo 
+fstrandgen --select=log_prob $dir/LG.fst | fstprint --isymbols=data/phones_disambig.txt --osymbols=data/words.txt -
+
+grep '#' data/phones_disambig.txt | awk '{print $2}' > $dir/disambig_phones.list
+
+fstcomposecontext $monophone_opts \
+  --read-disambig-syms=$dir/disambig_phones.list \
+  --write-disambig-syms=$dir/disambig_ilabels.list \
+   $dir/ilabels < $dir/LG.fst >$dir/CLG.fst
+
+ # for debugging:
+ fstmakecontextsyms data/phones.txt $dir/ilabels > $dir/context_syms.txt
+ echo "Example string from CLG.fst: "
+ echo 
+ fstrandgen --select=log_prob $dir/CLG.fst | fstprint --isymbols=$dir/context_syms.txt --osymbols=data/words.txt -
+
+fstisstochastic $dir/CLG.fst || echo "warning: CLG not stochastic."
+
+make-ilabel-transducer --write-disambig-syms=$dir/disambig_ilabels_remapped.list $dir/ilabels $tree $model $dir/ilabels.remapped > $dir/ilabel_map.fst
+
+# Reduce size of CLG by remapping symbols...
+fsttablecompose $dir/ilabel_map.fst $dir/CLG.fst  | fstdeterminizestar --use-log=true \
+  | fstminimizeencoded > $dir/CLG2.fst
+
+
+cat $dir/CLG2.fst |  fstisstochastic  || echo "warning: CLG2 is not stochastic."
+
+make-h-transducer --disambig-syms-out=$dir/disambig_tstate.list \
+  --transition-scale=$tscale $dir/ilabels.remapped $tree $model > $dir/Ha.fst
+
+
+fsttablecompose $dir/Ha.fst $dir/CLG2.fst | fstdeterminizestar --use-log=true \
+ | fstrmsymbols $dir/disambig_tstate.list | fstrmepslocal | fstminimizeencoded > $dir/HCLGa.fst
+
+fstisstochastic $dir/HCLGa.fst || echo "HCLGa is not stochastic"
+
+add-self-loops --self-loop-scale=$loopscale --reorder=$reorder $model < $dir/HCLGa.fst > $dir/HCLG.fst
+
+if [ $tscale == 1.0 -a $loopscale == 1.0 ]; then
+  # No point doing this test if transition-scale not 1, as it is bound to fail. 
+  fstisstochastic $dir/HCLG.fst || echo "Final HCLG is not stochastic."
+fi
+
+fstisstochastic $dir/HCLG.fst || echo "Final HCLG is not stochastic."
+
+
+#The next five lines are debug.
+# The last two lines of this block print out some alignment info.
+fstrandgen --select=log_prob $dir/HCLG.fst |  fstprint --osymbols=data/words.txt > $dir/rand.txt
+cat $dir/rand.txt | awk 'BEGIN{printf("0  ");} {if(NF>=3 && $3 != 0){ printf ("%d ",$3); }} END {print ""; }' > $dir/rand_align.txt
+
+show-alignments data/phones.txt $model ark:$dir/rand_align.txt
+cat $dir/rand.txt | awk ' {if(NF>=4 && $4 != "<eps>"){ printf ("%s ",$4); }} END {print ""; }'
+
--- a/egs/rm/s1/scripts/mkgraph_alt.sh
+++ b/egs/rm/s1/scripts/mkgraph_alt.sh
@ -0,0 +1,115 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# This version of mkgraph.sh creates the C fst explicitly.
+
+reorder=true # Dan-style, make false for Mirko+Lukas's decoder.
+
+for x in 1 2 3; do 
+  if [ $1 == "--mono" ]; then
+    monophone_opts="--context-size=1 --central-position=0"
+    shift;
+  fi
+
+  if [ $1 == "--noreorder" ]; then 
+    reorder=false # we set this for the Kaldi decoder.
+    shift;
+  fi
+done
+
+if [ $# != 3 ]; then
+   echo "Usage: scripts/mkgraph.sh  <tree> <model> <graphdir>"
+   exit 1;
+fi
+
+if [ -f path.sh ]; then . path.sh; fi
+
+
+tree=$1
+model=$2
+dir=$3
+
+mkdir -p $dir
+
+tscale=1.0
+loopscale=0.1
+
+fsttablecompose data/L_disambig.fst data/G.fst | fstdeterminizestar --use-log=true | \
+  fstminimizeencoded  > $dir/LG.fst
+
+fstisstochastic $dir/LG.fst || echo "warning: LG not stochastic."
+
+echo "Example string from LG.fst: "
+echo 
+fstrandgen --select=log_prob $dir/LG.fst | fstprint --isymbols=data/phones_disambig.txt --osymbols=data/words.txt -
+
+grep '#' data/phones_disambig.txt | awk '{print $2}' > $dir/disambig_phones.list
+subseq_sym=`tail -1 data/phones_disambig.txt | awk '{print $2+1;}'`
+cp data/phones_disambig.txt $dir/phones_disambig_subseq.txt
+echo '$' $subseq_sym >> $dir/phones_disambig_subseq.txt
+
+fstmakecontextfst --read-disambig-syms=$dir/disambig_phones.list \
+ --write-disambig-syms=$dir/disambig_ilabels.list data/phones.txt $subseq_sym \
+   $dir/ilabels | fstarcsort --sort_type=olabel > $dir/C.fst
+
+fstaddsubsequentialloop $subseq_sym $dir/LG.fst | \
+ fsttablecompose $dir/C.fst - > $dir/CLG.fst
+
+
+ # for debugging:
+ fstmakecontextsyms data/phones.txt $dir/ilabels > $dir/context_syms.txt
+ echo "Example string from CLG.fst: "
+ echo 
+ fstrandgen --select=log_prob $dir/CLG.fst | fstprint --isymbols=$dir/context_syms.txt --osymbols=data/words.txt -
+
+fstisstochastic $dir/CLG.fst || echo "warning: CLG not stochastic."
+
+make-ilabel-transducer --write-disambig-syms=$dir/disambig_ilabels_remapped.list $dir/ilabels $tree $model $dir/ilabels.remapped > $dir/ilabel_map.fst
+
+# Reduce size of CLG by remapping symbols...
+fstcompose $dir/ilabel_map.fst $dir/CLG.fst  | fstdeterminizestar --use-log=true \
+  | fstminimizeencoded > $dir/CLG2.fst
+
+
+cat $dir/CLG2.fst |  fstisstochastic  || echo "warning: CLG2 is not stochastic."
+
+make-h-transducer --disambig-syms-out=$dir/disambig_tstate.list \
+  --transition-scale=$tscale $dir/ilabels.remapped $tree $model > $dir/Ha.fst
+
+
+fsttablecompose $dir/Ha.fst $dir/CLG2.fst | fstdeterminizestar --use-log=true \
+ | fstrmsymbols $dir/disambig_tstate.list | fstrmepslocal | fstminimizeencoded > $dir/HCLGa.fst
+
+fstisstochastic $dir/HCLGa.fst || echo "HCLGa is not stochastic"
+
+add-self-loops --self-loop-scale=$loopscale --reorder=$reorder $model < $dir/HCLGa.fst > $dir/HCLG.fst
+
+if [ $tscale == 1.0 -a $loopscale == 1.0 ]; then
+  # No point doing this test if transition-scale not 1, as it is bound to fail. 
+  fstisstochastic $dir/HCLG.fst || echo "Final HCLG is not stochastic."
+fi
+
+fstisstochastic $dir/HCLG.fst || echo "Final HCLG is not stochastic."
+
+
+#The next five lines are debug.
+# The last two lines of this block print out some alignment info.
+fstrandgen --select=log_prob $dir/HCLG.fst |  fstprint --osymbols=data/words.txt > $dir/rand.txt
+cat $dir/rand.txt | awk 'BEGIN{printf("0  ");} {if(NF>=3 && $3 != 0){ printf ("%d ",$3); }} END {print ""; }' > $dir/rand_align.txt
+show-alignments data/phones.txt $model ark:$dir/rand_align.txt
+cat $dir/rand.txt | awk ' {if(NF>=4 && $4 != "<eps>"){ printf ("%s ",$4); }} END {print ""; }'
+
--- a/egs/rm/s1/scripts/process_warps.pl
+++ b/egs/rm/s1/scripts/process_warps.pl
@ -0,0 +1,47 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# This script is part of a diagnostic step when using exponential transforms.
+
+$map=$ARGV[0]; open(M,"<$map")||die "opening map file $map";
+while(<M>){ @A=split(" ",$_); $map{$A[0]} = $A[1]; }
+while(<STDIN>){  
+    ($spk,$warp)=split(" ",$_); 
+    $class = int($class/2);
+    defined $map{$spk} || die "No gender info for speaker $spk";
+    $warps{$map{$spk}} = $warps{$map{$spk}} . "$warp ";
+}
+@K = sort keys %warps;
+@K==2||die "wrong number of keys [empty warps file?]";
+foreach $k ( @K ) {
+    $s =  join(" ", sort { $a <=> $b } ( split(" ", $warps{$k}) )) ;
+    print "$k = [ $s ];\n";
+} 
+# f,m may be reversed below; doesnt matter.
+foreach $w ( split(" ", $warps{$K[0]}) ) {
+    $nf += 1; $sumf += $w; $sumf2 += $w*$w;
+}
+foreach $w ( split(" ", $warps{$K[1]}) ) {
+    $nm += 1; $summ += $w; $summ2 += $w*$w;
+}
+$sumf /= $nf; $sumf2 /= $nf;
+$summ /= $nm; $summ2 /= $nm;
+$sumf2 -= $sumf*$sumf;
+$summ2 -= $summ*$summ;
+$avgwithin = 0.5*($sumf2+$summ2 );
+$diff = abs($sumf - $summ) / sqrt($avgwithin);
+print "% class separation is $diff\n"; 
--- a/egs/rm/s1/scripts/silphones.pl
+++ b/egs/rm/s1/scripts/silphones.pl
@ -0,0 +1,57 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# creates integer lists of silence and non-silence phones in files,
+# e.g. silphones.csl="1:2:3 \n"
+# and nonsilphones.csl="4:5:6:7:...:24\n";
+
+if(@ARGV != 4) {
+    die "Usage: silphones.pl phones.txt \"sil1 sil2 sil3\" silphones.csl nonsilphones.csl";
+}
+
+($symtab, $sillist, $silphones, $nonsilphones) = @ARGV;
+open(S,"<$symtab") || die "Opening symbol table $symtab";
+
+
+foreach $s (split(" ", $sillist)) {
+    $issil{$s} = 1;
+}
+
+@sil = ();
+@nonsil = ();
+while(<S>){
+    @A = split(" ", $_);
+    @A == 2 || die "Bad line $_ in phone-symbol-table file $symtab";
+    ($sym, $int) = @A;
+    if($int != 0) {
+        if($issil{$sym}) { push @sil, $int; $seensil{$sym}=1; }
+        else { push @nonsil, $int; }
+    }
+}
+
+foreach $k(keys %issil) {
+    if(!$seensil{$k}) { die "No such silence phone $k"; }
+}
+open(F, ">$silphones") || die "opening silphones file $silphones";
+open(G, ">$nonsilphones") || die "opening nonsilphones file $nonsilphones";
+print F join(":", @sil) . "\n";
+print G join(":", @nonsil) . "\n";
+close(F);
+close(G);
+if(@sil == 0) { print STDERR "Warning: silphones.pl no silence phones.\n" }
+if(@nonsil == 0) { print STDERR "Warning: silphones.pl no non-silence phones.\n" }
+
--- a/egs/rm/s1/scripts/spk2utt_to_utt2spk.pl
+++ b/egs/rm/s1/scripts/spk2utt_to_utt2spk.pl
@ -0,0 +1,27 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+while(<>){ 
+    @A = split(" ", $_);
+    @A > 1 || die "Invalid line in spk2utt file: $_";
+    $s = shift @A;
+    foreach $u ( @A ) {
+        print "$u $s\n";
+    }
+}
+
+
--- a/egs/rm/s1/scripts/split_scp.pl
+++ b/egs/rm/s1/scripts/split_scp.pl
@ -0,0 +1,181 @@
+#!/usr/bin/perl -w
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+
+# This program splits up any kind of .scp or archive-type file.
+# If there is no utt2spk option it will work on any text  file and
+# will split it up with an approximately equal number of lines in
+# each but.
+# With the --utt2spk option it will work on anything that has the 
+# utterance-id as the first entry on each line; the utt2spk file is
+# of the form "utterance speaker" (on each line).
+# It splits it into equal size chunks as far as it can.  If you use
+# the utt2spk option it will make sure these chunks coincide with
+# speaker boundaries.  In this case, if there are more chunks
+# than speakers (and in some other circumstances), some of the 
+# resulting  chunks will be empty and it
+# will print a warning.
+# You will normally call this like:
+# split_scp.pl scp scp.1 scp.2 scp.3 ...
+# or
+# split_scp.pl --utt2spk=utt2spk scp scp.1 scp.2 scp.3 ...
+# Note that you can use this script to split the utt2spk file itself,
+# e.g. split_scp.pl --utt2spk=utt2spk utt2spk utt2spk.1 utt2spk.2 ...
+
+if(@ARGV < 2 ) {
+    die "Usage: split_scp.pl [--utt2spk=<utt2spk_file>] in.scp out1.scp out2.scp ... ";
+}
+
+if($ARGV[0] =~ m:^-:) {  
+    # Everything inside this block
+    # corresponds to what we do when the --utt2spk option is used.
+    $opt = shift @ARGV;
+    @A = split("=", $opt);
+    if(@A != 2 || $A[0] ne "--utt2spk") {
+        die "split_scp.pl: invalid option $ARGV[0]";
+    }
+    $utt2spk_file = $A[1];
+    open(U, "<$utt2spk_file") || die "Failed to open utt2spk file $utt2spk_file";
+    while(<U>) {
+        @A = split;
+        @A == 2 || die "Bad line $_ in utt2spk file $utt2spk_file";
+        ($u,$s) = @A;
+        $utt2spk{$u} = $s;
+    }
+    $inscp = shift @ARGV;
+    open(I, "<$inscp") || die "Opening input scp file $inscp";
+    @spkrs = ();
+    while(<I>) {
+        @A = split;
+        if(@A == 0) { die "Empty or space-only line in scp file $inscp"; }
+        $u = $A[0];
+        $s = $utt2spk{$u};
+        if(!defined $s) { die "No such utterance $u in utt2spk file $utt2spk_file"; }
+        if(!defined $spk_count{$s}) { 
+            push @spkrs, $s; 
+            $spk_count{$s} = 0;
+            $spk_data{$s} = "";
+        }
+        $spk_count{$s}++;
+        $spk_data{$s} = $spk_data{$s} . $_;
+    }
+    # Now split as equally as possible ..
+    # First allocate spks to files by given approximately
+    # equal #spks.
+    $numspks = @spkrs;  # number of speakers.
+    $numscps = @ARGV; # number of output files.
+    $spksperscp = int( ($numspks+($numscps-1)) / $numscps); # the +$(numscps-1) forces rounding up.
+    for($scpidx = 0; $scpidx < $numscps; $scpidx++) {
+        $scparray[$scpidx] = []; # [] is array reference.
+        for($n = $spksperscp * $scpidx; 
+            $n < $numspks && $n < $spksperscp*($scpidx+1); 
+            $n++) {
+            $spk = $spkrs[$n];
+            push @{$scparray[$scpidx]}, $spk;
+            $scpcount[$scpidx] += $spk_count{$spk};
+        }
+    }
+    # Now will try to reassign beginning + ending speakers
+    # to different scp's and see if it gets more balanced.
+    # Suppose objf we're minimizing is sum_i (num utts in scp[i] - average)^2.
+    # We can show that if considering changing just 2 scp's, we minimize
+    # this by minimizing the squared difference in sizes.  This is
+    # equivalent to minimizing the absolute difference in sizes.  This
+    # shows this method is bound to converge.
+
+    $changed = 1;
+    while($changed) {
+        $changed = 0;
+        for($scpidx = 0; $scpidx < $numscps; $scpidx++) {
+            # First try to reassign ending spk of this scp.
+            if($scpidx < $numscps-1) {
+                $sz = @{$scparray[$scpidx]};
+                if($sz > 0) {
+                    $spk = $scparray[$scpidx]->[$sz-1];
+                    $count = $spk_count{$spk};
+                    $nutt1 = $scpcount[$scpidx];
+                    $nutt2 = $scpcount[$scpidx+1];
+                    if( abs( ($nutt2+$count) - ($nutt1-$count))
+                        < abs($nutt2 - $nutt1))  { # Would decrease
+                        # size-diff by reassigning spk...
+                        $scpcount[$scpidx+1] += $count;
+                        $scpcount[$scpidx] -= $count;
+                        pop @{$scparray[$scpidx]};
+                        unshift @{$scparray[$scpidx+1]}, $spk;
+                        $changed = 1;
+                    }
+                }
+            }
+            if($scpidx > 0 && @{$scparray[$scpidx]} > 0) {
+                $spk = $scparray[$scpidx]->[0];
+                $count = $spk_count{$spk};
+                $nutt1 = $scpcount[$scpidx-1];
+                $nutt2 = $scpcount[$scpidx];
+                if( abs( ($nutt2-$count) - ($nutt1+$count))
+                    < abs($nutt2 - $nutt1))  { # Would decrease
+                    # size-diff by reassigning spk...
+                    $scpcount[$scpidx-1] += $count;
+                    $scpcount[$scpidx] -= $count;
+                    shift @{$scparray[$scpidx]};
+                    push @{$scparray[$scpidx-1]}, $spk;
+                    $changed = 1;
+                }
+            }
+        }
+    }
+    # Now print out the files...
+    for($scpidx = 0; $scpidx < $numscps; $scpidx++) {
+        $scpfn = $ARGV[$scpidx];
+        open(F, ">$scpfn") || die "Could not open scp file $scpfn for writing.";
+        $count = 0;
+        if(@{$scparray[$scpidx]} == 0) {
+            print STDERR "Warning: split_scp.pl producing empty .scp file $scpfn (too many splits and too few speakers?)";
+        }
+        foreach $spk ( @{$scparray[$scpidx]} ) {
+            print F $spk_data{$spk};
+            $count += $spk_count{$spk};
+        }
+        if($count != $scpcount[$scpidx]) { die "Count mismatch [code error]"; }
+        close(F);
+    }
+} else { 
+   # This block is the "normal" case where there is no --utt2spk 
+   # option and we just break into equal size chunks.
+
+    $inscp = shift @ARGV;
+    open(I, "<$inscp") || die "Opening input scp file $inscp";
+
+    $numscps = @ARGV;  # size of array.
+    @F = ();
+    while(<I>) {
+        push @F, $_;
+    }
+    $numlines = @F;
+    if($numlines == 0) {
+        print STDERR "split_scp.pl: warning: empty input scp file $inscp";
+    }
+    $linesperscp = int( ($numlines+($numscps-1)) / $numscps); # the +$(numscps-1) forces rounding up.
+# [just doing int() rounds down].
+    for($scpidx = 0; $scpidx < @ARGV; $scpidx++) {
+        $scpfile = $ARGV[$scpidx];
+        open(O, ">$scpfile") || die "Opening output scp file $scpfile";
+        for($n = $linesperscp * $scpidx; $n < $numlines && $n < $linesperscp*($scpidx+1); $n++) {
+            print O $F[$n];
+        }
+        close(O) || die "Closing scp file $scpfile";
+    }
+}
--- a/egs/rm/s1/scripts/subset_scp.pl
+++ b/egs/rm/s1/scripts/subset_scp.pl
@ -0,0 +1,59 @@
+#!/usr/bin/perl -w
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# This program selects a subset of N elements in the scp.
+# It selects them evenly from throughout the scp, in order to
+# avoid selecting too many from the same speaker.
+# It prints them on the standard output.
+
+if(@ARGV < 2 ) {
+    die "Usage: subset_scp.pl N in.scp ";
+}
+
+$N = shift @ARGV;
+if($N == 0) {
+    die "First command-line parameter to subset_scp.pl must be an integer, got \"$N\"";
+}
+$inscp = shift @ARGV;
+open(I, "<$inscp") || die "Opening input scp file $inscp";
+
+@F = ();
+while(<I>) {
+    push @F, $_;
+}
+$numlines = @F;
+if($N > $numlines) {
+    die "You requested from subset_scp.pl more elements than available: $N > $numlines";
+}
+
+sub select_n {
+    my ($start,$end,$num_needed) = @_;
+    my $diff = $end - $start;
+    if($num_needed > $diff) { die "select_n: code error"; }
+    if($diff == 1 ) {
+        if($num_needed  > 0) {
+            print $F[$start];
+        }
+    } else {
+        my $halfdiff = int($diff/2);
+        my $halfneeded = int($num_needed/2);
+        select_n($start, $start+$halfdiff, $halfneeded);
+        select_n($start+$halfdiff, $end, $num_needed - $halfneeded);
+    }
+}
+select_n(0, $numlines, $N);
+
--- a/egs/rm/s1/scripts/sym2int.pl
+++ b/egs/rm/s1/scripts/sym2int.pl
@ -0,0 +1,59 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+$ignore_oov = 0;
+$ignore_first_field = 0;
+for($x = 0; $x < 2; $x++) {
+    if($ARGV[0] eq "--ignore-oov") { $ignore_oov = 1; shift @ARGV; }
+    if($ARGV[0] eq "--ignore-first-field") { $ignore_first_field = 1; shift @ARGV; }
+}
+
+$symtab = shift @ARGV;
+if(!defined $symtab) {
+    die "Usage: sym2int.pl symtab [input transcriptions] > output transcriptions\n";
+}
+open(F, "<$symtab") || die "Error opening symbol table file $symtab";
+while(<F>) {
+    @A = split(" ", $_);
+    @A == 2 || die "bad line in symbol table file: $_";
+    $sym2int{$A[0]} = $A[1] + 0;
+}
+
+while(<>) {
+    @A = split(" ", $_);
+    if(@A == 0) {
+        die "Empty line in transcriptions input.";
+    }
+    if($ignore_first_field) {
+        $key = shift @A;
+        print $key . " ";
+    }
+    foreach $a (@A) {
+        $i = $sym2int{$a};
+        if(!defined ($i)) {
+            if($ignore_oov) {
+                print $a . " " ;
+            } else {
+                die "sym2int.pl: undefined symbol $a\n";
+            }
+        }
+        print $i . " ";
+    }
+    print "\n";
+}
+
+
--- a/egs/rm/s1/scripts/utt2spk_to_spk2utt.pl
+++ b/egs/rm/s1/scripts/utt2spk_to_spk2utt.pl
@ -0,0 +1,33 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+
+while(<>){ 
+    @A = split(" ", $_);
+    @A == 2 || die "Invalid line in utt2spk file: $_";
+    ($u,$s) = @A;
+    if(!$seen_spk{$s}) {
+        $seen_spk{$s} = 1;
+        push @spklist, $s;
+    }
+    $uttlist{$s} = $uttlist{$s} . "$u ";
+}
+foreach $s (@spklist) {
+    $l = $uttlist{$s};
+    $l =~ s: $::; # remove trailing space.
+    print "$s $l\n";
+}
--- a/egs/rm/s1/steps/decode_mono.sh
+++ b/egs/rm/s1/steps/decode_mono.sh
@ -0,0 +1,45 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# Monophone decoding script.
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_mono
+tree=exp/mono/tree
+mkdir -p $dir
+model=exp/mono/final.mdl
+graphdir=exp/graph_mono
+
+scripts/mkgraph.sh --mono $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
+
--- a/egs/rm/s1/steps/decode_sgmm.sh
+++ b/egs/rm/s1/steps/decode_sgmm.sh
@ -0,0 +1,45 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# SGMM decoding; use a different acoustic scale from normal (0.1 vs 0.08333)
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_sgmm
+tree=exp/sgmm/tree
+model=exp/sgmm/final.mdl
+graphdir=exp/graph_sgmm
+
+mkdir -p $dir
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
+
+  sgmm-decode-faster --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_sgmm2.sh
+++ b/egs/rm/s1/steps/decode_sgmm2.sh
@ -0,0 +1,54 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# SGMM decoding; use a different acoustic scale from normal (0.1 vs 0.08333)
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_sgmm2
+tree=exp/sgmm/tree
+model=exp/sgmm2/final.mdl
+graphdir=exp/graph_sgmm2
+
+mkdir -p $dir
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
+
+  sgmm-gselect $model "$feats" ark,t:- 2>$dir/gselect_${test}.log | gzip -c > $dir/gselect_${test}.gz || exit 1;
+  gselect_opt="--gselect-read=ark:gunzip -c $dir/gselect_${test}.gz|"
+  sgmm-decode-faster-spkvecs --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt "$gselect_opt" $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log || exit 1;
+  ali-to-post $dir/test_${test}.ali $dir/test_${test}.post 2> $dir/post_${test}.log || exit 1;
+
+  gselect_opt="--gselect=ark:gunzip -c $dir/gselect_${test}.gz|"
+  spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+  sgmm-est-spkvecs "$gselect_opt" --spk2utt= $model "$feats" $dir/test_${test}.post $dir/vecs_${test} 2> $dir/est_spkvecs_${test}.log || exit 1;
+  sgmm-decode-faster-spkvecs --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt "$gselect_opt" --spkvecs-read=$dir/vecs_${test} $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_vecs_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_sgmma.sh
+++ b/egs/rm/s1/steps/decode_sgmma.sh
@ -0,0 +1,45 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# SGMM decoding; use a different acoustic scale from normal (0.1 vs 0.08333)
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_sgmma
+tree=exp/sgmma/tree
+model=exp/sgmma/final.mdl
+graphdir=exp/graph_sgmma
+
+mkdir -p $dir
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
+
+  sgmm-decode-faster --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_sgmmb.sh
+++ b/egs/rm/s1/steps/decode_sgmmb.sh
@ -0,0 +1,69 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# SGMM decoding with adaptation.
+# 
+# SGMM decoding; use a different acoustic scale from normal (0.1 vs 0.08333)
+# (1) decode with "alignment model"
+# (2) get GMM posteriors with "alignment model" and estimate speaker
+#     vectors with final model
+# (3) decode with final model.
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_sgmmb
+tree=exp/sgmmb/tree
+model=exp/sgmmb/final.mdl
+alimodel=exp/sgmmb/final.alimdl
+graphdir=exp/graph_sgmmb
+silphonelist=`cat data/silphones.csl`
+
+mkdir -p $dir
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
+  spk2utt_opt="--spk2utt=ark:data/test_${test}.spk2utt"
+  utt2spk_opt="--utt2spk=ark:data/test_${test}.utt2spk"
+
+  sgmm-gselect $model "$feats" ark,t:- 2>$dir/gselect.log | \
+     gzip -c > $dir/${test}_gselect.gz || exit 1;
+  gselect_opt="--gselect=ark:gunzip -c $dir/${test}_gselect.gz|"
+
+  # Use smaller beam first time.
+  sgmm-decode-faster "$gselect_opt" --beam=15.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt $alimodel $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.pre_tra ark,t:$dir/test_${test}.pre_ali  2> $dir/predecode_${test}.log
+
+  ( ali-to-post ark:$dir/test_${test}.pre_ali ark:- | \
+    weight-silence-post 0.01 $silphonelist $alimodel ark:- ark:- | \
+    sgmm-post-to-gpost "$gselect_opt" $alimodel "$feats" ark,s,cs:- ark:- | \
+    sgmm-est-spkvecs-gpost "$spk2utt_opt" $model "$feats" ark,s,cs:- \
+       ark:$dir/test_${test}.vecs ) 2>$dir/vecs_${test}.log
+  
+
+  sgmm-decode-faster $utt2spk_opt --spk-vecs=ark:$dir/test_${test}.vecs --beam=20.0 --acoustic-scale=0.1 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri1.sh
+++ b/egs/rm/s1/steps/decode_tri1.sh
@ -0,0 +1,44 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri1
+tree=exp/tri1/tree
+model=exp/tri1/final.mdl
+graphdir=exp/graph_tri1
+
+mkdir -p $dir
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri1_fmllr.sh
+++ b/egs/rm/s1/steps/decode_tri1_fmllr.sh
@ -0,0 +1,65 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# deocde_tri_fmllr.sh is as decode_tri.sh but estimating fMLLR in test,
+# per speaker.  There is no SAT.
+# To be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+srcdir=exp/decode_tri1
+dir=exp/decode_tri1_fmllr
+mkdir -p $dir
+model=exp/tri1/final.mdl
+tree=exp/tri1/tree
+graphdir=exp/graph_tri1
+silphones=`cat data/silphones.csl`
+
+mincount=500 # mincount before we estimate a transform.
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  # Comment the two lines below to make this per-utterance.
+  # This would only work if $srcdir was also per-utterance [otherwise
+  # you'd have to mess with the script a bit].
+  spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+
+  sifeats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
+
+  ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
+    weight-silence-post 0.01 $silphones $model ark:- ark:- | \
+    gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
+     "$sifeats" ark,o:- ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
+
+  feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+  # the ,p option lets it score partial output without dying..
+
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+    compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra > $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
+
--- a/egs/rm/s1/steps/decode_tri1_regtree_fmllr.sh
+++ b/egs/rm/s1/steps/decode_tri1_regtree_fmllr.sh
@ -0,0 +1,69 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# deocde_tri_regtree_fmllr.sh is as ../decode_tri.sh but estimating fMLLR in test,
+# per speaker.  There is no SAT.  Use a regression-tree with top-level speech/sil
+# split (no silence weighting).
+
+if [ -f path.sh ]; then . path.sh; fi
+srcdir=exp/decode_tri1
+dir=exp/decode_tri1_regtree_fmllr
+mkdir -p $dir
+model=exp/tri1/final.mdl
+occs=exp/tri1/final.occs
+tree=exp/tri1/tree
+graphdir=exp/graph_tri1
+silphones=`cat data/silphones.csl`
+
+regtree=$dir/regtree
+maxleaves=8 # max # of regression-tree leaves.
+mincount=5000 # mincount before we add new transform.
+gmm-make-regtree --sil-phones=$silphones --state-occs=$occs --max-leaves=$maxleaves $model $regtree 2>$dir/make_regtree.out
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  # Comment the two lines below to make this per-utterance.
+  # This would only work if $srcdir was also per-utterance [otherwise
+  # you'd have to mess with the script a bit].
+  spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+
+  # To deweight silence, would add the line
+  #   weight-silence-post 0.0 $silphones $model ark:- ark:- | \
+  # after the line with ali-to-post
+  # This is useful if we don't treat silence specially when building regression tree.
+
+  feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
+  ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
+    gmm-est-regtree-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model "$feats" ark:- $regtree ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
+
+  gmm-decode-faster-regtree-fmllr $utt2spk_opt --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst $regtree "$feats" ark:$dir/${test}.fmllr ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+  # the ,p option lets it score partial output without dying..
+
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+    compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra > $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
+
--- a/egs/rm/s1/steps/decode_tri2a.sh
+++ b/egs/rm/s1/steps/decode_tri2a.sh
@ -0,0 +1,44 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2a
+mkdir -p $dir
+model=exp/tri2a/final.mdl
+tree=exp/tri2a/tree
+graphdir=exp/graph_tri2a
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2a_fmllr.sh
+++ b/egs/rm/s1/steps/decode_tri2a_fmllr.sh
@ -0,0 +1,65 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# deocde_tri_fmllr.sh is as decode_tri.sh but estimating fMLLR in test,
+# per speaker.  There is no SAT.
+# To be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+srcdir=exp/decode_tri2a
+dir=exp/decode_tri2a_fmllr
+mkdir -p $dir
+model=exp/tri2a/final.mdl
+tree=exp/tri2a/tree
+graphdir=exp/graph_tri2a
+silphones=`cat data/silphones.csl`
+
+mincount=500 # mincount before we estimate a transform.
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  # Comment the two lines below to make this per-utterance.
+  # This would only work if $srcdir was also per-utterance [otherwise
+  # you'd have to mess with the script a bit].
+  spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+
+  sifeats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
+
+  ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
+    weight-silence-post 0.01 $silphones $model ark:- ark:- | \
+    gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
+     "$sifeats" ark,o:- ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
+
+  feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+  # the ,p option lets it score partial output without dying..
+
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+    compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra > $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
+
--- a/egs/rm/s1/steps/decode_tri2a_fmllr_utt.sh
+++ b/egs/rm/s1/steps/decode_tri2a_fmllr_utt.sh
@ -0,0 +1,65 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# deocde_tri_fmllr.sh is as decode_tri.sh but estimating fMLLR in test,
+# per speaker.  There is no SAT.
+# To be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+srcdir=exp/decode_tri2a
+dir=exp/decode_tri2a_fmllr_utt
+mkdir -p $dir
+model=exp/tri2a/final.mdl
+tree=exp/tri2a/tree
+graphdir=exp/graph_tri2a
+silphones=`cat data/silphones.csl`
+
+mincount=500 # mincount before we estimate a transform.
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  # Comment the two lines below to make this per-utterance.
+  # This would only work if $srcdir was also per-utterance [otherwise
+  # you'd have to mess with the script a bit].
+  #spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  #utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+
+  sifeats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
+
+  ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
+    weight-silence-post 0.01 $silphones $model ark:- ark:- | \
+    gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
+     "$sifeats" ark,o:- ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
+
+  feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+  # the ,p option lets it score partial output without dying..
+
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+    compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra > $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
+
--- a/egs/rm/s1/steps/decode_tri2b.sh
+++ b/egs/rm/s1/steps/decode_tri2b.sh
@ -0,0 +1,67 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2b
+mkdir -p $dir
+model=exp/tri2b/final.mdl
+alignmodel=exp/tri2b/final.alimdl
+et=exp/tri2b/final.et
+defaultmat=exp/tri2b/default.mat
+tree=exp/tri2b/tree
+graphdir=exp/graph_tri2b
+silphones=`cat data/silphones.csl`
+
+# already made the graph.
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  defaultfeats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:-|"
+  sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
+
+  # First do SI decoding with alignment model.
+  # Use smaller beam for this, as less critical.
+  gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali  2> $dir/predecode_${test}.log
+
+  # Comment the two lines below to make this per-utterance.
+  spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+  
+ ( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
+    weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
+    gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
+    gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1  $model $et \
+      "$sifeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
+     2>$dir/et_${test}.log || exit 1;
+
+  feats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2c.sh
+++ b/egs/rm/s1/steps/decode_tri2c.sh
@ -0,0 +1,62 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# Decode the testing data.
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2c
+mkdir -p $dir
+model=exp/tri2c/final.mdl
+tree=exp/tri2c/tree
+graphdir=exp/graph_tri2c
+# Note, the following 3 options must match the same options in train_tri2c.sh
+norm_vars=false
+after_deltas=false
+per_spk=true
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  if [ $per_spk == "true" ]; then
+    spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+    utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+  fi # else empty.
+
+  echo "Computing cepstral mean and variance stats."
+  # compute mean and variance stats.
+  if [ $after_deltas == true ]; then
+    add-deltas --print-args=false scp:data/test_${test}.scp ark:- | compute-cmvn-stats $spk2utt_opt ark:- ark:$dir/cmvn_${test}ark 2>$dir/cmvn_${test}.log
+    feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | apply-cmvn --norm-vars=$norm_vars $utt2spk_opt ark:$dir/cmvn_${test}ark ark:- ark:- |"
+  else 
+    compute-cmvn-stats --spk2utt=ark:data/test_${test}.spk2utt scp:data/test_${test}.scp ark:$dir/cmvn_${test} 2>$dir/cmvn_${test}.log
+    feats="ark:apply-cmvn --norm-vars=$norm_vars $utt2spk_opt ark:$dir/cmvn_${test} scp:data/test_${test}.scp ark:- | add-deltas --print-args=false ark:- ark:- |"
+  fi
+
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+# the ,p option lets it score partial output without dying..
+
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra > $dir/wer_${test}
+) &
+done
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2d.sh
+++ b/egs/rm/s1/steps/decode_tri2d.sh
@ -0,0 +1,45 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2d
+mkdir -p $dir
+model=exp/tri2d/final.mdl
+tree=exp/tri2d/tree
+graphdir=exp/graph_tri2d
+transform=exp/tri2d/final.mat
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2e.sh
+++ b/egs/rm/s1/steps/decode_tri2e.sh
@ -0,0 +1,45 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2e
+mkdir -p $dir
+model=exp/tri2e/final.mdl
+tree=exp/tri2e/tree
+graphdir=exp/graph_tri2e
+transform=exp/tri2e/final.mat
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2f.sh
+++ b/egs/rm/s1/steps/decode_tri2f.sh
@ -0,0 +1,45 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2f
+mkdir -p $dir
+model=exp/tri2f/final.mdl
+tree=exp/tri2f/tree
+graphdir=exp/graph_tri2f 
+transform=exp/tri2f/final.mat
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2g.sh
+++ b/egs/rm/s1/steps/decode_tri2g.sh
@ -0,0 +1,65 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2g
+mkdir -p $dir
+model=exp/tri2g/final.mdl
+alignmodel=exp/tri2g/final.alimdl
+lvtln=exp/tri2g/final.lvtln
+tree=exp/tri2g/tree
+graphdir=exp/graph_tri2g
+silphones=`cat data/silphones.csl`
+
+# already made the graph.
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
+
+  # First do SI decoding with alignment model.
+  # Use smaller beam for this, as less critical.
+  gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali  2> $dir/predecode_${test}.log
+
+  # Comment the two lines below to make this per-utterance.
+  spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+  
+ ( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
+    weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
+    gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
+    gmm-est-lvtln-trans --verbose=1 $spk2utt_opt  $model $lvtln \
+      "$sifeats" ark:- ark:$dir/lvtln_${test}.trans ark,t:$dir/lvtln_${test}.warp ) \
+     2>$dir/lvtln_${test}.log || exit 1;
+
+  feats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/lvtln_${test}.trans ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2g_diag.sh
+++ b/egs/rm/s1/steps/decode_tri2g_diag.sh
@ -0,0 +1,65 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2g_diag
+mkdir -p $dir
+model=exp/tri2g/final.mdl
+alignmodel=exp/tri2g/final.alimdl
+lvtln=exp/tri2g/final.lvtln
+tree=exp/tri2g/tree
+graphdir=exp/graph_tri2g
+silphones=`cat data/silphones.csl`
+
+# already made the graph.
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
+
+  # First do SI decoding with alignment model.
+  # Use smaller beam for this, as less critical.
+  gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali  2> $dir/predecode_${test}.log
+
+  # Comment the two lines below to make this per-utterance.
+  spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+  
+ ( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
+    weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
+    gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
+    gmm-est-lvtln-trans --norm-type=diag --verbose=1 $spk2utt_opt  $model $lvtln \
+      "$sifeats" ark:- ark:$dir/lvtln_${test}.trans ark,t:$dir/lvtln_${test}.warp ) \
+     2>$dir/lvtln_${test}.log || exit 1;
+
+  feats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/lvtln_${test}.trans ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2g_vtln.sh
+++ b/egs/rm/s1/steps/decode_tri2g_vtln.sh
@ -0,0 +1,79 @@
+# as decode_tri2g but using the feature-level VTLN
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# as opposed to the linear VTLN when decoding.
+# Also computing a maximum-likelihood mean offset,
+# for better comparability with LVTLN.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2g_vtln
+mkdir -p $dir
+vtlnmodel=exp/tri2g/final.vtlnmdl
+lvtlnmodel=exp/tri2g/final.mdl
+alignmodel=exp/tri2g/final.alimdl
+lvtln=exp/tri2g/final.lvtln
+tree=exp/tri2g/tree
+graphdir=exp/graph_tri2g
+silphones=`cat data/silphones.csl`
+
+# Doesn't matter which model we use when making the graph
+# (only the transitions and structure are used).
+scripts/mkgraph.sh $tree $vtlnmodel $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
+
+  # First do SI decoding with alignment model.
+  # Use smaller beam for this, as less critical.
+  gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali  2> $dir/predecode_${test}.log
+
+  # Comment the two lines below to make this per-utterance.
+  spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+  
+ ( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
+    weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
+    gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
+    gmm-est-lvtln-trans --verbose=1 $spk2utt_opt  $lvtlnmodel $lvtln \
+      "$sifeats" ark:- ark:/dev/null ark,t:$dir/lvtln_${test}.warp ) \
+     2>$dir/lvtln_${test}.log || exit 1;
+
+  cat $dir/lvtln_${test}.warp | awk '{print $1, (0.85+0.01*$2);}' > $dir/${test}.factor
+
+  feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- |"
+
+ ( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
+    weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
+    gmm-est-fmllr --fmllr-update-type=offset $spk2utt_opt $vtlnmodel "$feats" ark,o:- ark:$dir/${test}.trans ) 2>$dir/fmllr_${test}.log  || exit 1;
+
+  feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.trans ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $vtlnmodel $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2g_vtln_diag.sh
+++ b/egs/rm/s1/steps/decode_tri2g_vtln_diag.sh
@ -0,0 +1,79 @@
+# as decode_tri2g but using the feature-level VTLN
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# as opposed to the linear VTLN when decoding.
+# Also computing a diagonal fMLLR transform for
+# comparison with ET.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2g_vtln_diag
+mkdir -p $dir
+vtlnmodel=exp/tri2g/final.vtlnmdl
+lvtlnmodel=exp/tri2g/final.mdl
+alignmodel=exp/tri2g/final.alimdl
+lvtln=exp/tri2g/final.lvtln
+tree=exp/tri2g/tree
+graphdir=exp/graph_tri2g
+silphones=`cat data/silphones.csl`
+
+# Doesn't matter which model we use when making the graph
+# (only the transitions and structure are used).
+scripts/mkgraph.sh $tree $vtlnmodel $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
+
+  # First do SI decoding with alignment model.
+  # Use smaller beam for this, as less critical.
+  gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali  2> $dir/predecode_${test}.log
+
+  # Comment the two lines below to make this per-utterance.
+  spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+  
+ ( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
+    weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
+    gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
+    gmm-est-lvtln-trans --verbose=1 $spk2utt_opt  $lvtlnmodel $lvtln \
+      "$sifeats" ark:- ark:/dev/null ark,t:$dir/lvtln_${test}.warp ) \
+     2>$dir/lvtln_${test}.log || exit 1;
+
+  cat $dir/lvtln_${test}.warp | awk '{print $1, (0.85+0.01*$2);}' > $dir/${test}.factor
+
+  feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- |"
+
+ ( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
+    weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
+    gmm-est-fmllr --fmllr-update-type=diag $spk2utt_opt $vtlnmodel "$feats" ark,o:- ark:$dir/${test}.trans ) 2>$dir/fmllr_${test}.log  || exit 1;
+
+  feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.trans ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $vtlnmodel $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2g_vtln_nofmllr.sh
+++ b/egs/rm/s1/steps/decode_tri2g_vtln_nofmllr.sh
@ -0,0 +1,71 @@
+# as decode_tri2g but using the feature-level VTLN
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# as opposed to the linear VTLN when decoding.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2g_vtln_nofmllr
+mkdir -p $dir
+vtlnmodel=exp/tri2g/final.vtlnmdl
+lvtlnmodel=exp/tri2g/final.mdl
+alignmodel=exp/tri2g/final.alimdl
+lvtln=exp/tri2g/final.lvtln
+tree=exp/tri2g/tree
+graphdir=exp/graph_tri2g
+silphones=`cat data/silphones.csl`
+
+# Doesn't matter which model we use when making the graph
+# (only the transitions and structure are used).
+scripts/mkgraph.sh $tree $vtlnmodel $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
+
+  # First do SI decoding with alignment model.
+  # Use smaller beam for this, as less critical.
+  gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali  2> $dir/predecode_${test}.log
+
+  # Comment the two lines below to make this per-utterance.
+  spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+  
+ ( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
+    weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
+    gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
+    gmm-est-lvtln-trans --verbose=1 $spk2utt_opt  $lvtlnmodel $lvtln \
+      "$sifeats" ark:- ark:/dev/null ark,t:$dir/lvtln_${test}.warp ) \
+     2>$dir/lvtln_${test}.log || exit 1;
+
+  cat $dir/lvtln_${test}.warp | awk '{print $1, (0.85+0.01*$2);}' > $dir/${test}.factor
+
+  feats="ark:compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/${test}.factor --config=conf/mfcc.conf scp:data_prep/test_${test}_wav.scp ark:- | add-deltas ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $vtlnmodel $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2h.sh
+++ b/egs/rm/s1/steps/decode_tri2h.sh
@ -0,0 +1,45 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2h
+mkdir -p $dir
+model=exp/tri2h/final.mdl
+tree=exp/tri2h/tree
+graphdir=exp/graph_tri2h
+transform=exp/tri2h/final.mat
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2i.sh
+++ b/egs/rm/s1/steps/decode_tri2i.sh
@ -0,0 +1,45 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2i
+mkdir -p $dir
+model=exp/tri2i/final.mdl
+tree=exp/tri2i/tree
+graphdir=exp/graph_tri2i
+transform=exp/tri2i/final.mat
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  feats="ark:add-deltas --delta-order=3 scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2j.sh
+++ b/egs/rm/s1/steps/decode_tri2j.sh
@ -0,0 +1,44 @@
+# to be run from ..
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2j
+mkdir -p $dir
+model=exp/tri2j/final.mdl
+tree=exp/tri2j/tree
+graphdir=exp/graph_tri2j
+transform=exp/tri2j/final.mat
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  feats="ark:add-deltas --delta-order=3 scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2k.sh
+++ b/egs/rm/s1/steps/decode_tri2k.sh
@ -0,0 +1,68 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2k
+mkdir -p $dir
+model=exp/tri2k/final.mdl
+alignmodel=exp/tri2k/final.alimdl
+et=exp/tri2k/final.et
+tree=exp/tri2k/tree
+graphdir=exp/graph_tri2k
+ldamat=exp/tri2k/lda.mat
+defaultmat=exp/tri2k/default.mat
+silphones=`cat data/silphones.csl`
+
+# already made the graph.
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  defaultfeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:- |"
+  sifeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- |"
+
+  # First do SI decoding with alignment model.
+  # Use smaller beam for this, as less critical.
+  gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali  2> $dir/predecode_${test}.log
+
+  # Comment the two lines below to make this per-utterance.
+  spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+  
+ ( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
+    weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
+    gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
+    gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1  $model $et \
+      "$sifeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
+     2>$dir/et_${test}.log || exit 1;
+
+  feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output (cut off in mid-line) without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2k_fmllr.sh
+++ b/egs/rm/s1/steps/decode_tri2k_fmllr.sh
@ -0,0 +1,77 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2k_fmllr
+mkdir -p $dir
+model=exp/tri2k/final.mdl
+alignmodel=exp/tri2k/final.alimdl
+et=exp/tri2k/final.et
+tree=exp/tri2k/tree
+graphdir=exp/graph_tri2k
+ldamat=exp/tri2k/lda.mat
+defaultmat=exp/tri2k/default.mat
+silphones=`cat data/silphones.csl`
+
+# already made the graph.
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  defaultfeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:- |"
+  basefeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- |"
+
+  # First do SI decoding with alignment model.
+  # Use smaller beam for this, as less critical.
+  gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pass1.tra ark,t:$dir/test_${test}_pass1.ali  2> $dir/pass1decode_${test}.log
+
+  # Comment the two lines below to make this per-utterance.
+  spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+  
+ ( ali-to-post ark:$dir/test_${test}_pass1.ali ark:- | \
+    weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
+    gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
+    gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1  $model $et \
+      "$basefeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
+     2>$dir/et_${test}.log || exit 1;
+
+  feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}_pass2.tra ark,t:$dir/test_${test}_pass2.ali  2> $dir/pass2decode_${test}.log
+
+ ( ali-to-post ark:$dir/test_${test}_pass2.ali ark:- | \
+    weight-silence-post 0.0 $silphones $model ark:- ark:- | \
+    gmm-est-fmllr $spk2utt_opt $model "$feats" ark:- ark:$dir/fmllr_${test}.trans ) \
+     2>$dir/fmllr_${test}.log || exit 1;
+
+  feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/fmllr_${test}.trans ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2k_regtree_fmllr.sh
+++ b/egs/rm/s1/steps/decode_tri2k_regtree_fmllr.sh
@ -0,0 +1,80 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2k_regtree_fmllr
+mkdir -p $dir
+model=exp/tri2k/final.mdl
+alignmodel=exp/tri2k/final.alimdl
+et=exp/tri2k/final.et
+tree=exp/tri2k/tree
+graphdir=exp/graph_tri2k
+ldamat=exp/tri2k/lda.mat
+defaultmat=exp/tri2k/default.mat
+silphones=`cat data/silphones.csl`
+
+occs=exp/tri2k/final.occs
+regtree=$dir/regtree
+maxleaves=8 # max # of regression-tree leaves.
+mincount=5000 # mincount before we add new transform.
+gmm-make-regtree --sil-phones=$silphones --state-occs=$occs --max-leaves=$maxleaves $model $regtree 2>$dir/make_regtree.out
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  defaultfeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:- |"
+  basefeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- |"
+
+  # First do SI decoding with alignment model.
+  # Use smaller beam for this, as less critical.
+  gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pass1.tra ark,t:$dir/test_${test}_pass1.ali  2> $dir/pass1decode_${test}.log
+
+  # Comment the two lines below to make this per-utterance.
+  spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+  
+ ( ali-to-post ark:$dir/test_${test}_pass1.ali ark:- | \
+    weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
+    gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
+    gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1  $model $et \
+      "$basefeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
+     2>$dir/et_${test}.log || exit 1;
+
+  feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}_pass2.tra ark,t:$dir/test_${test}_pass2.ali  2> $dir/pass2decode_${test}.log
+
+ ( ali-to-post ark:$dir/test_${test}_pass2.ali ark:- | \
+    weight-silence-post 0.0 $silphones $model ark:- ark:- | \
+    gmm-est-regtree-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model "$feats" ark:- $regtree ark:$dir/${test}.fmllr ) \
+     2>$dir/fmllr_${test}.log || exit 1;
+
+  gmm-decode-faster-regtree-fmllr $utt2spk_opt --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst $regtree "$feats" ark:$dir/${test}.fmllr ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2k_utt.sh
+++ b/egs/rm/s1/steps/decode_tri2k_utt.sh
@ -0,0 +1,68 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2k_utt
+mkdir -p $dir
+model=exp/tri2k/final.mdl
+alignmodel=exp/tri2k/final.alimdl
+et=exp/tri2k/final.et
+tree=exp/tri2k/tree
+graphdir=exp/graph_tri2k
+ldamat=exp/tri2k/lda.mat
+defaultmat=exp/tri2k/default.mat
+silphones=`cat data/silphones.csl`
+
+# already made the graph.
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  defaultfeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $defaultmat ark:- ark:- |"
+  sifeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- |"
+
+  # First do SI decoding with alignment model.
+  # Use smaller beam for this, as less critical.
+  gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$defaultfeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali  2> $dir/predecode_${test}.log
+
+  # Comment the two lines below to make this per-utterance.
+  #spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  #utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+  
+ ( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
+    weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
+    gmm-post-to-gpost $alignmodel "$defaultfeats" ark:- ark:- | \
+    gmm-est-et $spk2utt_opt --normalize-type=mean-and-var --verbose=1  $model $et \
+      "$sifeats" ark:- ark:$dir/et_${test}.trans ark,t:$dir/et_${test}.warp ) \
+     2>$dir/et_${test}.log || exit 1;
+
+  feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $ldamat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/et_${test}.trans ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output (cut off in mid-line) without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2l.sh
+++ b/egs/rm/s1/steps/decode_tri2l.sh
@ -0,0 +1,61 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2l
+mkdir -p $dir
+model=exp/tri2l/final.mdl
+alignmodel=exp/tri2l/final.alimdl
+tree=exp/tri2l/tree
+graphdir=exp/graph_tri2l 
+transform=exp/tri2l/final.mat
+silphones=`cat data/silphones.csl`
+mincount=500
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+
+  sifeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
+
+  # Use smaller beam for 1st pass.
+  gmm-decode-faster --beam=17.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}.pre_tra ark,t:$dir/test_${test}.pre_ali  2> $dir/predecode_${test}.log
+
+ ( ali-to-post ark:$dir/test_${test}.pre_ali ark:- | \
+    weight-silence-post 0.0 $silphones $model ark:- ark:- | \
+    gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
+    "$sifeats" ark,o:- ark:$dir/${test}.fmllr ) 2>$dir/fmllr_${test}.log
+  
+  feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri2l_utt.sh
+++ b/egs/rm/s1/steps/decode_tri2l_utt.sh
@ -0,0 +1,61 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# to be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/decode_tri2l_utt
+mkdir -p $dir
+model=exp/tri2l/final.mdl
+alignmodel=exp/tri2l/final.alimdl
+tree=exp/tri2l/tree
+graphdir=exp/graph_tri2l 
+transform=exp/tri2l/final.mat
+silphones=`cat data/silphones.csl`
+mincount=300
+
+scripts/mkgraph.sh $tree $model $graphdir
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+ (
+  #spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
+  #utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
+
+  sifeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:-|"
+
+  # Use smaller beam for 1st pass.
+  gmm-decode-faster --beam=17.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}.pre_tra ark,t:$dir/test_${test}.pre_ali  2> $dir/predecode_${test}.log
+
+ ( ali-to-post ark:$dir/test_${test}.pre_ali ark:- | \
+    weight-silence-post 0.0 $silphones $model ark:- ark:- | \
+    gmm-est-fmllr --fmllr-min-count=$mincount $spk2utt_opt $model \
+    "$sifeats" ark,o:- ark:$dir/${test}.fmllr ) 2>$dir/fmllr_${test}.log
+  
+  feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $transform ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
+
+  gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali  2> $dir/decode_${test}.log
+
+ # the ,p option lets it score partial output without dying..
+  scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
+  compute-wer --mode=present ark:-  ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
+ ) &
+done
+
+wait
+
+cat $dir/wer_* | \
+  awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
+   > $dir/wer
--- a/egs/rm/s1/steps/decode_tri_mixup.sh
+++ b/egs/rm/s1/steps/decode_tri_mixup.sh
@ -0,0 +1,32 @@
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# Decode the testing data.
+
+# this is the hardest test set, see
+# http://www.itl.nist.gov/iad/mig/tests/rt/ASRhistory/pdf/resource_management_92eval.pdf
+
+dir=exp/decode_tri_mixup
+mkdir -p $dir
+srcdir=exp/tri_mixup
+model=$srcdir/25.mdl
+graphdir=exp/graph_tri_mixup
+
+
+../src/bin/faster-decode-gmm --acoustic-scale=0.08333 --word-symbol-table=data/words.txt  $model $graphdir/HCLG.fst data/test_sep92.scp  $dir/word_transcripts.txt $dir/alignments.txt > $dir/decode.out
+
+../src/bin/compute-wer --symbol-table=data/words.txt  data_prep/test_sep92_trans.txt $dir/word_transcripts.txt  > $dir/wer
+
--- a/egs/rm/s1/steps/init_sgmm.sh
+++ b/egs/rm/s1/steps/init_sgmm.sh
@ -0,0 +1,48 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# Initialize SGMM from a trained HMM/GMM system.
+
+if [ -f path.sh ]; then . path.sh; fi
+
+dir=exp/sgmm/init
+mkdir -p $dir
+srcdir=exp/tri1
+model=exp/sgmm/0.mdl
+
+init-ubm --intermediate-numcomps=2000 --ubm-numcomps=400 --verbose=2 \
+    --fullcov-ubm=true $srcdir/final.mdl $srcdir/final.occs \
+    $dir/ubm0 2> $dir/cluster.log
+
+
+subset[0]=1000
+subset[1]=1500
+subset[2]=2000
+subset[3]=2500
+
+for x in 0 1 2 3; do
+    echo "Pass $x"
+    feats="ark:scripts/subset_scp.pl ${subset[$x]} data/train.scp | add-deltas --print-args=false scp:- ark:- |"
+    fgmm-global-acc-stats --diag-gmm-nbest=15 --binary=false --verbose=2 $dir/ubm$x "$feats" $dir/$x.acc \
+	2> $dir/acc.$x.log  || exit 1;
+    fgmm-global-est --verbose=2 $dir/ubm$x $dir/$x.acc \
+	$dir/ubm$[$x+1] 2> $dir/update.$x.log || exit 1;
+    rm $dir/$x.acc
+done
+
+sgmm-init $srcdir/final.mdl $dir/ubm4 $model 2> $dir/sgmm_init.log
+
--- a/egs/rm/s1/steps/make_mfcc_test.sh
+++ b/egs/rm/s1/steps/make_mfcc_test.sh
@ -0,0 +1,44 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# To be run from .. (one directory up from here)
+
+if [ $# != 1 ]; then
+   echo "usage: make_mfcc_test.sh <abs-path-to-tmpdir>"
+   exit 1;
+fi
+
+if [ -f path.sh ]; then . path.sh; fi
+
+dir=exp/make_mfcc
+mkdir -p $dir
+root_out=$1
+mkdir -p $root_out
+
+for test in mar87 oct87 feb89 oct89 feb91 sep92; do
+  scpin=data_prep/test_${test}_wav.scp 
+# Making it like this so it works for others on the BUT filesystem.
+# It will generate the correct scp file without running the feature extraction.
+  log=$dir/make_mfcc_test_${test}.log
+  (
+    compute-mfcc-feats  --verbose=2 --config=conf/mfcc.conf scp:$scpin ark,scp:$root_out/test_${test}_raw_mfcc.ark,$root_out/test_${test}_raw_mfcc.scp  2> $log || tail $log
+    cp $root_out/test_${test}_raw_mfcc.scp data/test_${test}.scp
+  ) &
+done
+
+wait
+
+echo "If the above produced no output on the screen, it succeeded."
--- a/egs/rm/s1/steps/make_mfcc_train.sh
+++ b/egs/rm/s1/steps/make_mfcc_train.sh
@ -0,0 +1,43 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+# To be run from .. (one directory up from here)
+
+if [ $# != 1 ]; then
+    echo "usage: make_mfcc_train.sh <abs-path-to-tmpdir>";
+    exit 1;
+fi
+
+if [ -f path.sh ]; then . path.sh; fi
+
+scpin=data_prep/train_wav.scp  
+dir=exp/make_mfcc
+mkdir -p $dir
+root_out=$1
+mkdir -p $root_out
+
+scripts/split_scp.pl $scpin $dir/train_wav{1,2,3,4}.scp
+
+for n in 1 2 3 4; do # Use 4 CPUs
+   log=$dir/make_mfcc_train.$n.log
+   compute-mfcc-feats  --verbose=2 --config=conf/mfcc.conf scp:$dir/train_wav${n}.scp  ark,scp:$root_out/train_raw_mfcc${n}.ark,$root_out/train_raw_mfcc${n}.scp  2> $log || tail $log &
+done
+
+wait;
+
+cat $root_out/train_raw_mfcc{1,2,3,4}.scp > data/train.scp
+
+echo "If the above produced no output on the screen, it succeeded."
--- a/egs/rm/s1/steps/prepare_graphs.sh
+++ b/egs/rm/s1/steps/prepare_graphs.sh
@ -0,0 +1,66 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# The output of this script is the symbol tables data/{words.txt,phones.txt},
+# and the grammars and lexicons data/{L,G}{,_disambig}.fst
+
+# To be run from ..
+if [ -f path.sh ]; then . path.sh; fi
+
+cp data_prep/G.txt data/
+scripts/make_words_symtab.pl < data/G.txt > data/words.txt
+cp data_prep/lexicon.txt data/
+
+
+scripts/make_phones_symtab.pl < data/lexicon.txt > data/phones.txt
+
+silphones="sil"; # This would in general be a space-separated list of all silence phones.  E.g. "sil vn"
+# Generate colon-separated lists of silence and non-silence phones.
+scripts/silphones.pl data/phones.txt "$silphones" data/silphones.csl data/nonsilphones.csl
+
+ndisambig=`scripts/add_lex_disambig.pl data/lexicon.txt data/lexicon_disambig.txt`
+scripts/add_disambig.pl data/phones.txt $ndisambig > data/phones_disambig.txt
+
+
+# Create train transcripts in integer format:
+cat data_prep/train_trans.txt | \
+  scripts/sym2int.pl --ignore-first-field data/words.txt  > data/train.tra
+
+
+# Get lexicon in FST format.
+
+# silprob = 0.5: same prob as word.
+scripts/make_lexicon_fst.pl data/lexicon.txt 0.5 sil  | fstcompile --isymbols=data/phones.txt --osymbols=data/words.txt --keep_isymbols=false --keep_osymbols=false | fstarcsort --sort_type=olabel > data/L.fst
+
+scripts/make_lexicon_fst.pl data/lexicon_disambig.txt 0.5 sil  | fstcompile --isymbols=data/phones_disambig.txt --osymbols=data/words.txt --keep_isymbols=false --keep_osymbols=false | fstarcsort --sort_type=olabel > data/L_disambig.fst
+
+fstcompile --isymbols=data/words.txt --osymbols=data/words.txt --keep_isymbols=false --keep_osymbols=false data/G.txt > data/G.fst
+
+# Checking that G is stochastic [note, it wouldn't be for an Arpa]
+fstisstochastic data/G.fst || echo Error
+
+
+# Checking that disambiguated lexicon times G is determinizable
+fsttablecompose data/L_disambig.fst data/G.fst | fstdeterminize >/dev/null || echo Error
+
+# Checking that LG is stochastic:
+fsttablecompose data/L.fst data/G.fst | fstisstochastic || echo Error
+
+## Check lexicon.
+## just have a look and make sure it seems sane.
+fstprint   --isymbols=data/phones.txt --osymbols=data/words.txt data/L.fst  | head
+
--- a/egs/rm/s1/steps/train_et2.sh
+++ b/egs/rm/s1/steps/train_et2.sh
@ -0,0 +1,121 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To be run from ..
+
+# train_et2.sh is as train_et.sh but using an adapt model with
+# fewer Gaussians.  Seeing if this makes the warp distribution more
+# bimodal.
+
+
+
+if [ -f path.sh ]; then . path.sh; fi
+srcdir=exp/adapt2
+dir=exp/et2
+srcmodel=$srcdir/20.mdl
+
+normtype=mean # could be mean or none or mean-and-var
+
+spk2utt_opt=--spk2utt=ark:$dir/spk2utt
+utt2spk_opt=--utt2spk=ark:$dir/utt2spk
+# for per-utterance, uncomment the following [this would make it worse]:
+# spk2utt_opt=
+# utt2spk_opt=
+feats="ark:add-deltas scp:$dir/train.scp ark:- |"
+
+mkdir -p $dir
+
+nspk=109 # Use all 109 RM training speakers.
+nutt=15 # Use at most 15 utterances from each speaker.
+
+head -$nspk data/train.spk2utt | \
+   awk '{ printf("%s ",$1); for(x=2; x<=NF&&x<='$nutt'+1;x++)
+         {  printf("%s ", $x); } printf("\n"); }' > $dir/spk2utt
+
+scripts/spk2utt_to_utt2spk.pl < $dir/spk2utt > $dir/utt2spk
+cat $dir/utt2spk | awk '{print $1}' > $dir/uttlist
+scripts/filter_scp.pl $dir/uttlist <data/train.scp >$dir/train.scp
+
+silphonelist=`cat data/silphones.csl`
+
+cp $srcdir/tree $dir
+cp $srcdir/phone_map $dir
+
+# Use a subset of a training utts from srcdir, so we use the alignments from there:
+# link these.
+( 
+  cd $dir
+  ln -s ../../$srcdir/cur.ali .
+  ln -s ../../$srcmodel 0.mdl
+)
+
+# Init the transform:
+
+gmm-init-et --normalize-type=$normtype --binary=false --dim=39 $dir/0.et 2>$dir/init_et.log || exit 1
+
+  
+for x in 0 1 2 3 4 5 6 7 8 9 10 11; do
+    x1=$[$x+1]; 
+
+    # Work out current transforms:
+   ( ali-to-post ark:$dir/cur.ali ark:- | \
+    weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
+    gmm-post-to-gpost $srcmodel "$feats" ark:- ark:- | \
+    gmm-est-et $spk2utt_opt --verbose=1 $dir/$x.mdl $dir/$x.et "$feats" ark:- ark:$dir/$x.trans ark,t:$dir/$x.warp ) 2> $dir/trans.$x.log || exit 1;
+
+    # Accumulate stats to update model:
+   ( transform-feats $utt2spk_opt ark:$dir/$x.trans "$feats" ark:- 2>$dir/apply_fmllr.$x.log | \
+    gmm-acc-stats-twofeats $srcmodel "$feats" ark:- "ark:cat $dir/cur.ali | ali-to-post ark:- ark:- |" $dir/$x.acc ) 2>$dir/gmm_acc.$x.log || exit 1;
+
+
+    # Check likelihoods (must add the fMLLR determinants from apply_fmllr.$x.log, to get meaningful
+    # figures.)
+    ( transform-feats $utt2spk_opt ark:$dir/$x.trans "$feats" ark:-  | \
+     gmm-acc-stats $dir/$x.mdl ark:- "ark:cat $dir/cur.ali | ali-to-post ark:- ark:- |" /dev/null ) 2>$dir/gmm_getlike.$x.log || exit 1;
+
+
+    gmm-est --remove-low-count-gaussians=false $dir/$x.mdl $dir/$x.acc $dir/$x1.mdl 2>$dir/gmm_est.$x.log || exit 1;
+
+    # Next estimate either A or B, depending on iteration:
+    if [ $[$x%2] == 0 ]; then  # Estimate A:
+    ( ali-to-post ark:$dir/cur.ali ark:- | \
+      weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
+      gmm-post-to-gpost $srcmodel "$feats" ark:- ark:- | \
+      gmm-et-acc-a $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$feats" ark:- $dir/$x.et_acc_a ) 2> $dir/acc_a.$x.log || exit 1;
+      gmm-et-est-a --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.et_acc_a 2> $dir/update_a.$x.log || exit 1;
+      rm $dir/$x.et_acc_a
+    else
+    ( ali-to-post ark:$dir/cur.ali ark:- | \
+      weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
+      gmm-post-to-gpost $srcmodel "$feats" ark:- ark:- | \
+      gmm-et-acc-b $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$feats" ark:- ark:$dir/$x.trans ark:$dir/$x.warp $dir/$x.et_acc_b 2> $dir/acc_b.$x.log || exit 1;
+      gmm-et-est-b --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.mat $dir/$x.et_acc_b ) 2> $dir/update_b.$x.log || exit 1;
+      rm $dir/$x.et_acc_b
+      # Careful!: gmm-transform-means here changes $x1.mdl in-place. 
+      gmm-transform-means $dir/$x.mat $dir/$x1.mdl $dir/$x1.mdl 2> $dir/transform_means.$x.log
+    fi
+    rm $dir/$x.trans 
+    if [ $x != 0 ]; then
+      rm $dir/$x.mdl  # keep 0.mdl as it's the alignment model.
+    fi
+    rm $dir/$x.acc
+    x=$[$x+1];
+done
+
+for n in 0 1 2 3 4 5 6 7 8 9 10 11; do
+ cat $dir/$n.warp | scripts/process_warps.pl data/spk2gender.map > $dir/warps.$n
+done
--- a/egs/rm/s1/steps/train_mono.sh
+++ b/egs/rm/s1/steps/train_mono.sh
@ -0,0 +1,92 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To be run from ..
+if [ -f path.sh ]; then . path.sh; fi
+
+# Train the monophone on a subset-- no point using all the data.
+dir=exp/mono
+n=1000
+feats="ark:add-deltas --print-args=false scp:$dir/train.scp ark:- |"
+# need to quote when passing as an argument, as in "$feats",
+# since it has spaces in it.
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+
+numiters=30    # Number of iterations of training
+maxiterinc=20 # Last iter to increase #Gauss on.
+numgauss=250 # Initial num-Gauss (must be more than #states=3*phones).
+totgauss=1000 # Target #Gaussians.  
+incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
+realign_iters="1 2 3 4 5 6 7 8 9 10 12 15 20 25";
+
+
+mkdir -p $dir
+scripts/subset_scp.pl $n data/train.scp > $dir/train.scp
+
+
+silphones=`cat data/silphones.csl | sed 's/:/ /g'`
+nonsilphones=`cat data/nonsilphones.csl | sed 's/:/ /g'`
+cat conf/topo.proto | sed "s:NONSILENCEPHONES:$nonsilphones:" | sed "s:SILENCEPHONES:$silphones:" > $dir/topo
+
+gmm-init-mono '--train-feats=ark:head -10 data/train.scp | add-deltas scp:- ark:- |' $dir/topo 39  $dir/0.mdl $dir/tree 2> $dir/init.out || exit 1;
+
+
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/0.mdl  data/L.fst \
+       "ark:scripts/subset_scp.pl $n data/train.tra|" \
+   "ark:|gzip -c >$dir/graphs.fsts.gz"  2>$dir/compile_graphs.log || exit 1 
+
+echo Pass 0
+
+align-equal-compiled "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
+   ark,t,f:-  2>$dir/align.0.log | \
+ gmm-acc-stats-ali --binary=true $dir/0.mdl "$feats" ark:- \
+     $dir/0.acc 2> $dir/acc.0.log  || exit 1;
+
+# In the following steps, the --min-gaussian-occupancy=3 option is important, otherwise
+# we fail to est "rare" phones and later on, they never align properly.
+gmm-est --min-gaussian-occupancy=3  --mix-up=$numgauss \
+    $dir/0.mdl $dir/0.acc $dir/1.mdl 2> $dir/update.0.log || exit 1;
+
+rm $dir/0.acc
+
+
+beam=4 # will change to 8 below after 1st pass
+x=1
+while [ $x -lt $numiters ]; do
+  echo "Pass $x"
+  if echo $realign_iters | grep -w $x >/dev/null; then
+    echo "Aligning data"
+    gmm-align-compiled $scale_opts --beam=$beam --retry-beam=$[$beam*4] $dir/$x.mdl \
+        "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" t,ark:$dir/cur.ali \
+        2> $dir/align.$x.log || exit 1;
+  fi
+  gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log  || exit 1;
+  gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
+  rm $dir/$x.mdl $dir/$x.acc
+  if [ $x -le $maxiterinc ]; then
+     numgauss=$[$numgauss+$incgauss];
+  fi
+  beam=8
+  x=$[$x+1]
+done
+
+( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl )
+
+# example of showing the alignments:
+# show-alignments data/phones.txt $dir/30.mdl ark:$dir/cur.ali | head -4
+
--- a/egs/rm/s1/steps/train_sgmm1.sh
+++ b/egs/rm/s1/steps/train_sgmm1.sh
@ -0,0 +1,87 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+if [ -f path.sh ]; then . path.sh; fi
+
+# To be run from ..
+
+dir=exp/sgmm
+srcdir=exp/tri1
+srcmodel=$srcdir/final.mdl
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+
+numiters=25   # Total number of iterations
+
+realign_iters="5 10 15";
+silphonelist=`cat data/silphones.csl`
+numsubstates=1500 # Initial #-substates.
+totsubstates=5000 # Target #-substates.
+maxiterinc=15 # Last iter to increase #substates on.
+incsubstates=$[($totsubstates-$numsubstates)/$maxiterinc] # per-iter increment for #substates
+gselect_opt="--gselect=ark:gunzip -c $dir/gselect.gz|"
+randprune=0.1
+mkdir -p $dir
+
+feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+
+cp $srcdir/tree $dir
+
+echo "aligning all training data"
+if [ ! -f $dir/0.ali ]; then
+  gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel "$srcgraphs" \
+        "$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+fi
+
+if [ ! -f $dir/0.mdl ]; then
+   echo "you must run init_sgmm.sh before train_sgmm1.sh"
+   exit 1
+fi
+
+if [ ! -f $dir/gselect.gz ]; then
+ sgmm-gselect $dir/0.mdl "$feats" ark,t:- 2>$dir/gselect.log | gzip -c > $dir/gselect.gz || exit 1;
+fi
+
+cp $dir/0.ali $dir/cur.ali || exit 1;
+
+iter=0
+while [ $iter -lt $numiters ]; do
+   echo "Pass $iter ... "
+   if echo $realign_iters | grep -w $iter >/dev/null; then
+      echo "Aligning data"
+         sgmm-align-compiled $scale_opts "$gselect_opt" --beam=8 --retry-beam=40 $dir/$iter.mdl \
+           	"$srcgraphs" "$feats" \
+         	ark:$dir/cur.ali 2> $dir/align.$iter.log || exit 1;
+   fi
+   if [ $iter -gt 0 ]; then
+     flags=vMwcS
+   else
+     flags=vwcS
+   fi
+   if [ ! -f $dir/$[$iter+1].mdl ]; then
+     sgmm-acc-stats-ali --update-flags=$flags "$gselect_opt" --rand-prune=$randprune --binary=false $dir/$iter.mdl "$feats" ark:$dir/cur.ali $dir/$iter.acc 2> $dir/acc.$iter.log  || exit 1;
+     sgmm-est --update-flags=$flags --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
+   fi
+#  	rm $dir/$iter.mdl $dir/$iter.acc
+#  	rm $dir/$iter.occs 
+    if [ $iter -lt $maxiterinc ]; then
+       numsubstates=$[$numsubstates+$incsubstates]
+    fi
+    iter=$[$iter+1];
+done
+
+( cd $dir; rm final.mdl final.occs 2>/dev/null; ln -s $iter.mdl final.mdl; ln -s $iter.occs final.occs )
--- a/egs/rm/s1/steps/train_sgmm2.sh
+++ b/egs/rm/s1/steps/train_sgmm2.sh
@ -0,0 +1,103 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# This is SGMM training with speaker vectors.
+
+if [ -f path.sh ]; then . path.sh; fi
+
+# To be run from ..
+
+dir=exp/sgmm2
+srcdir=exp/sgmm
+gmmtridir=exp/tri1
+trimodel=$gmmtridir/final.mdl
+srcgraphs="ark:gunzip -c $gmmtridir/graphs.fsts.gz|"
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+
+numiters=25   # Total number of iterations
+
+realign_iters="5 10 15";
+silphonelist=`cat data/silphones.csl`
+numsubstates=1500 # Initial #-substates.
+totsubstates=5000 # Target #-substates.
+maxiterinc=15 # Last iter to increase #substates on.
+incsubstates=$[($totsubstates-$numsubstates)/$maxiterinc] # per-iter increment for #substates
+gselect_opt="--gselect=ark:gunzip -c $dir/gselect.gz|"
+randprune=0.1
+spkdim=39
+mkdir -p $dir
+
+feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+
+cp $gmmtridir/tree $srcdir/{0.ali,0.mdl,gselect.gz} $dir
+
+if [ ! -f $dir/0.ali ]; then
+    echo "aligning all training data"
+    gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $trimodel "$srcgraphs" \
+        "$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+fi
+
+if [ ! -f $dir/0.mdl ]; then
+   echo "you must run init_sgmm.sh before train_sgmm2.sh"
+   exit 1
+fi
+
+if [ ! -f $dir/gselect.gz ]; then
+    sgmm-gselect $dir/0.mdl "$feats" ark,t:- 2>$dir/gselect.log | gzip -c > $dir/gselect.gz || exit 1;
+fi
+
+cp $dir/0.ali $dir/cur.ali || exit 1;
+
+iter=0
+while [ $iter -lt $numiters ]; do
+    echo "Pass $iter ... "
+    if [ $iter -gt 0 ]; then
+	if [ $iter -le 5 ]; then # only train phonetic subspace
+	    flags=vMwcS
+    	elif [ $(( $iter % 2 )) -eq 1 ]; then # odd iterations
+	    flags=vMwcS
+	else	# even iterations, update N and not M
+    	    flags=vwcSN
+	fi
+    else
+     	flags=vwcS
+    fi
+
+    if [ ! -f $dir/$[$iter+1].mdl ]; then
+        if echo $realign_iters | grep -w $iter >/dev/null; then
+    	    echo "Aligning data"
+            sgmm-align-compiled $scale_opts "$gselect_opt" --beam=8 --retry-beam=40 $dir/$iter.mdl \
+	        "$srcgraphs" "$feats" \
+		ark:$dir/cur.ali 2> $dir/align.$iter.log || exit 1;
+    	fi
+     	sgmm-acc-stats-ali --update-flags=$flags "$gselect_opt" --rand-prune=$randprune --binary=false $dir/$iter.mdl "$feats" ark:$dir/cur.ali $dir/$iter.acc 2> $dir/acc.$iter.log  || exit 1;
+	if [ $iter -eq 5 ]; then  # increase spk dimension from 0 to 39
+	    sgmm-estimate --update-flags=$flags --increase-spk-dim=$spkdim --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
+	else 
+     	    sgmm-estimate --update-flags=$flags --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
+	fi
+    fi
+
+    rm $dir/$iter.acc # $dir/$iter.mdl
+#    rm $dir/$iter.occs 
+    if [ $iter -lt $maxiterinc ]; then
+       numsubstates=$[$numsubstates+$incsubstates]
+    fi
+    iter=$[$iter+1];
+done
+
+( cd $dir; rm final.mdl final.occs 2>/dev/null; ln -s $iter.mdl final.mdl; ln -s $iter.occs final.occs )
--- a/egs/rm/s1/steps/train_sgmma.sh
+++ b/egs/rm/s1/steps/train_sgmma.sh
@ -0,0 +1,88 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+if [ -f path.sh ]; then . path.sh; fi
+
+dir=exp/sgmma
+ubm=exp/ubma/4.ubm
+srcdir=exp/tri1
+srcmodel=$srcdir/final.mdl
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+
+numiters=25   # Total number of iterations
+
+realign_iters="5 10 15";
+silphonelist=`cat data/silphones.csl`
+numsubstates=1500 # Initial #-substates.
+totsubstates=5000 # Target #-substates.
+maxiterinc=15 # Last iter to increase #substates on.
+incsubstates=$[($totsubstates-$numsubstates)/$maxiterinc] # per-iter increment for #substates
+gselect_opt="--gselect=ark:gunzip -c $dir/gselect.gz|"
+randprune=0.1
+mkdir -p $dir
+
+feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+
+cp $srcdir/tree $dir
+
+if [ ! -f $ubm ]; then
+  echo "No UBM in $ubm"
+fi
+sgmm-init $srcdir/final.mdl $ubm $dir/0.mdl 2> $dir/sgmm_init.log
+
+echo "aligning all training data"
+if [ ! -f $dir/0.ali ]; then
+  gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel "$srcgraphs" \
+        "$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+fi
+
+if [ ! -f $dir/gselect.gz ]; then
+ sgmm-gselect $dir/0.mdl "$feats" ark,t:- 2>$dir/gselect.log | gzip -c > $dir/gselect.gz || exit 1;
+fi
+
+cp $dir/0.ali $dir/cur.ali || exit 1;
+
+iter=0
+while [ $iter -lt $numiters ]; do
+   echo "Pass $iter ... "
+   if echo $realign_iters | grep -w $iter >/dev/null; then
+      echo "Aligning data"
+      echo "Aligning data"
+      sgmm-align-compiled $spkvecs_opt $scale_opts "$gselect_opt" --beam=8 \
+          --retry-beam=40 $dir/$iter.mdl "$srcgraphs" "$feats" \
+      	ark:$dir/cur.ali 2> $dir/align.$iter.log || exit 1;
+   fi
+   if [ $iter -gt 0 ]; then
+     flags=vMwcS
+   else
+     flags=vwcS
+   fi
+   if [ ! -f $dir/$[$iter+1].mdl ]; then
+     sgmm-acc-stats-ali --update-flags=$flags "$gselect_opt" --rand-prune=$randprune --binary=false $dir/$iter.mdl "$feats" ark:$dir/cur.ali $dir/$iter.acc 2> $dir/acc.$iter.log  || exit 1;
+     sgmm-est --update-flags=$flags --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
+   fi
+# TEMP: will restore these statements later.
+#  	rm $dir/$iter.mdl $dir/$iter.acc
+#  	rm $dir/$iter.occs 
+    if [ $iter -lt $maxiterinc ]; then
+       numsubstates=$[$numsubstates+$incsubstates]
+    fi
+    iter=$[$iter+1];
+done
+
+( cd $dir; rm final.mdl final.occs 2>/dev/null; ln -s $iter.mdl final.mdl; ln -s $iter.occs final.occs )
--- a/egs/rm/s1/steps/train_sgmmb.sh
+++ b/egs/rm/s1/steps/train_sgmmb.sh
@ -0,0 +1,131 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+if [ -f path.sh ]; then . path.sh; fi
+
+# To be run from ..
+# You must run init_sgmma.sh first.
+# We rely on the initial model exp/sgmma/0.mdl being there
+
+dir=exp/sgmmb
+srcdir=exp/tri1
+srcmodel=$srcdir/final.mdl
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+
+numiters=25   # Total number of iterations
+
+ubm=exp/ubma/4.ubm
+realign_iters="5 10 15"; 
+spkvec_iters="5 8 12 17 22"
+silphonelist=`cat data/silphones.csl`
+numsubstates=1500 # Initial #-substates.
+totsubstates=5000 # Target #-substates.
+maxiterinc=15 # Last iter to increase #substates on.
+incsubstates=$[($totsubstates-$numsubstates)/$maxiterinc] # per-iter increment for #substates
+gselect_opt="--gselect=ark:gunzip -c $dir/gselect.gz|"
+# Initially don't have speaker vectors, but change this after
+# we estimate them.
+spkvecs_opt=
+randprune=0.1
+mkdir -p $dir
+
+utt2spk_opt="--utt2spk=ark:data/train.utt2spk"
+spk2utt_opt="--spk2utt=ark:data/train.spk2utt"
+feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+
+if [ ! -f $ubm ]; then
+  echo "No UBM in $ubm"
+fi
+
+sgmm-init --spk-space-dim=39 $srcdir/final.mdl $ubm $dir/0.mdl 2> $dir/sgmm_init.log || exit 1;
+
+cp $srcdir/tree $dir
+
+echo "aligning all training data"
+if [ ! -f $dir/0.ali ]; then
+  gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel "$srcgraphs" \
+        "$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+fi
+
+if [ ! -f $dir/0.mdl ]; then
+   echo "you must run init_sgmm.sh before train_sgmm1.sh"
+   exit 1
+fi
+
+if [ ! -f $dir/gselect.gz ]; then
+ sgmm-gselect $dir/0.mdl "$feats" ark,t:- 2>$dir/gselect.log | gzip -c > $dir/gselect.gz || exit 1;
+fi
+
+cp $dir/0.ali $dir/cur.ali || exit 1;
+
+iter=0
+while [ $iter -lt $numiters ]; do
+   echo "Pass $iter ... "
+   if echo $realign_iters | grep -w $iter >/dev/null; then
+      echo "Aligning data"
+      sgmm-align-compiled $spkvecs_opt $utt2spk_opt $scale_opts "$gselect_opt" \
+         --beam=8 --retry-beam=40 $dir/$iter.mdl "$srcgraphs" "$feats" \
+      	ark:$dir/cur.ali 2> $dir/align.$iter.log || exit 1;
+   fi
+   if echo $spkvec_iters | grep -w $iter >/dev/null; then
+    ( ali-to-post ark:$dir/cur.ali ark:- | \
+      weight-silence-post 0.01 $silphonelist $dir/$iter.mdl ark:- ark:- | \
+      sgmm-est-spkvecs $spk2utt_opt $spkvecs_opt "$gselect_opt" \
+        --rand-prune=$randprune $dir/$iter.mdl \
+       "$feats" ark:- ark:$dir/cur.vecs  2>$dir/spkvecs.$iter.log ) || exit 1;
+      spkvecs_opt="--spk-vecs=ark:$dir/cur.vecs"
+   fi  
+   if [ $iter -eq 0 ]; then
+     flags=vwcS
+   elif [ $[$iter%2] -eq 1 -a $iter -gt 4 ]; then # even iters after 4...
+     flags=vNwcS
+   else
+     flags=vMwcS
+   fi
+   if [ ! -f $dir/$[$iter+1].mdl ]; then
+     sgmm-acc-stats $spkvecs_opt $utt2spk_opt --update-flags=$flags "$gselect_opt" --rand-prune=$randprune --binary=false $dir/$iter.mdl "$feats" "ark:ali-to-post ark:$dir/cur.ali ark:-|" $dir/$iter.acc 2> $dir/acc.$iter.log  || exit 1;
+     sgmm-est --update-flags=$flags --split-substates=$numsubstates --write-occs=$dir/$[$iter+1].occs $dir/$iter.mdl $dir/$iter.acc $dir/$[$iter+1].mdl 2> $dir/update.$iter.log || exit 1;
+   fi
+  	rm $dir/$iter.mdl $dir/$iter.acc
+  	rm $dir/$iter.occs 
+    if [ $iter -lt $maxiterinc ]; then
+       numsubstates=$[$numsubstates+$incsubstates]
+    fi
+    iter=$[$iter+1];
+done
+
+
+# The point of this last phase of accumulation is to get Gaussian-level
+# alignments with the speaker vectors but accumulate stats without
+# any speaker vectors; we re-estimate M, w, c and S to get a model
+# that's compatible with not having speaker vectors.
+
+
+flags=MwcS
+( ali-to-post ark:$dir/cur.ali ark:- | \
+  sgmm-post-to-gpost $spkvecs_opt $utt2spk_opt "$gselect_opt" \
+                  $dir/$iter.mdl "$feats" ark,s,cs:- ark:- | \
+  sgmm-acc-stats-gpost --update-flags=$flags  $dir/$iter.mdl "$feats" \
+            ark,s,cs:- $dir/$iter.aliacc ) 2> $dir/acc_ali.$iter.log || exit 1;
+sgmm-est --update-flags=$flags --remove-speaker-space=true $dir/$iter.mdl \
+    $dir/$iter.aliacc $dir/$iter.alimdl 2>$dir/update_ali.$iter.log || exit 1;
+
+
+( cd $dir; rm final.mdl final.occs 2>/dev/null; 
+  ln -s $iter.mdl final.mdl; ln -s $iter.alimdl final.alimdl;
+  ln -s $iter.occs final.occs )
--- a/egs/rm/s1/steps/train_tri1.sh
+++ b/egs/rm/s1/steps/train_tri1.sh
@ -0,0 +1,109 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+
+# To be run from ..
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/tri1
+srcdir=exp/mono
+srcmodel=$srcdir/final.mdl
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+realign_iters="5 10 15 20";  
+silphonelist=`cat data/silphones.csl`
+
+numiters=25    # Number of iterations of training
+maxiterinc=15 # Last iter to increase #Gauss on.
+numleaves=1500
+numgauss=$[$numleaves + $numleaves/2]; 
+     # Initially mix up to avg. 1.5 Gauss/state ( a bit more
+     # than this, due to state clustering... then slowly mix 
+     # up to final amount.
+totgauss=7000 # Target #Gaussians
+incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
+
+
+mkdir $dir
+cp $srcdir/topo $dir
+
+feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+
+# Align all training data using old model.  Since we have more data for this pass,
+# we use the version of gmm-align that compiles the graphs itself.
+
+echo "aligning all training data"
+gmm-align $scale_opts --beam=8 --retry-beam=40 $srcdir/tree $srcmodel data/L.fst \
+   "$feats" ark:data/train.tra ark:$dir/0.ali 2> $dir/align.0.log || exit 1;
+acc-tree-stats  --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log  || exit 1;
+
+
+
+cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
+cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
+scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
+compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
+
+# Have to make silence root not-shared because we will not split it.
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+
+
+build-tree --verbose=1 --max-leaves=$numleaves \
+    $dir/treeacc $dir/roots.txt \
+    $dir/questions.qst $dir/topo $dir/tree  2> $dir/train_tree.log || exit 1;
+
+gmm-init-model  --write-occs=$dir/1.occs  \
+    $dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
+
+gmm-mixup --mix-up=$numgauss $dir/1.mdl $dir/1.occs $dir/1.mdl \
+   2>$dir/mixup.log || exit 1;
+
+rm $dir/treeacc
+
+# Convert alignments generated from monophone model, to use as initial alignments.
+
+convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log 
+  # Debug step only: convert back and check they're the same.
+  convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark:- \
+   2>/dev/null | cmp - $dir/0.ali || exit 1; 
+
+rm $dir/0.ali
+
+# Make training graphs
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/1.mdl  data/L.fst ark:data/train.tra \
+    "ark:|gzip -c >$dir/graphs.fsts.gz"  2>$dir/compile_graphs.log  || exit 1;
+
+x=1
+while [ $x -lt $numiters ]; do
+   echo pass $x
+   if echo $realign_iters | grep -w $x >/dev/null; then
+     echo "Aligning data"
+     gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
+             "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
+             ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
+   fi
+   gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log  || exit 1;
+   gmm-est --write-occs=$dir/$[$x+1].occs --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
+   rm $dir/$x.mdl $dir/$x.acc
+   rm $dir/$x.occs 
+   if [[ $x -le $maxiterinc ]]; then 
+      numgauss=$[$numgauss+$incgauss];
+   fi
+   x=$[$x+1];
+done
+
+( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl; ln -s $x.occs final.occs )
--- a/egs/rm/s1/steps/train_tri2a.sh
+++ b/egs/rm/s1/steps/train_tri2a.sh
@ -0,0 +1,101 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To be run from ..
+
+# This (train_tri2a) is a basic triphone training starting from tri1/,
+# to serve as a baseline for the other train_tri2? scripts.
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/tri2a
+srcdir=exp/tri1
+srcmodel=$srcdir/final.mdl
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+numiters=30    # Number of iterations of training
+maxiterinc=15 # Last iter to increase #Gauss on.
+numleaves=1500
+numgauss=$[$numleaves*2]; # Initially mix up to avg. 2 Gauss/state.
+                          # Then slowly mix up to final amount.
+totgauss=7000 # Target #Gaussians
+incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
+realign_iters="10 15 20";   # Because last model was reasonable, don't 
+  # realign too soon (i.e., on 5th iter).
+silphonelist=`cat data/silphones.csl`
+
+feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+
+mkdir -p $dir
+cp $srcdir/topo $dir
+
+echo "aligning all training data"
+gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel "$srcgraphs" \
+       "$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+
+acc-tree-stats  --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali \
+    $dir/treeacc 2> $dir/acc.tree.log  || exit 1;
+
+
+cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
+cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
+scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
+compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
+
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+
+build-tree --verbose=1 --max-leaves=$numleaves \
+    $dir/treeacc $dir/roots.txt \
+    $dir/questions.qst $dir/topo $dir/tree  2> $dir/train_tree.log || exit 1;
+
+gmm-init-model  --write-occs=$dir/1.occs  \
+    $dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
+
+rm $dir/treeacc
+
+# Convert alignments generated from previous model, to use as initial alignments.
+
+convert-ali  $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log 
+  # Debug step only: convert back and check they're the same.
+  convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
+   2>/dev/null | cmp - $dir/0.ali || exit 1; 
+
+rm $dir/0.ali
+
+# Make training graphs
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/1.mdl  data/L.fst ark:data/train.tra \
+   "ark:|gzip -c >$dir/graphs.fsts.gz"  2>$dir/compile_graphs.log  || exit 1 
+
+x=1
+while [ $x -lt $numiters ]; do
+   echo pass $x
+   if echo $realign_iters | grep -w $x >/dev/null; then
+     echo "Aligning data"
+     gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
+             ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
+   fi
+   gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log  || exit 1;
+   gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
+   rm $dir/$x.mdl $dir/$x.acc
+   if [ $x -le $maxiterinc ]; then 
+      numgauss=$[$numgauss+$incgauss];
+   fi
+   x=$[$x+1]
+done
+
+( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl )
+
--- a/egs/rm/s1/steps/train_tri2b.sh
+++ b/egs/rm/s1/steps/train_tri2b.sh
@ -0,0 +1,191 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To be run from ..
+
+# This (train_tri2b.sh) is training the exponential transform,
+# on top of standard double-delta features.
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/tri2b
+srcdir=exp/tri1
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+srcmodel=$srcdir/final.mdl
+dim=39
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+# The spk2utt_opt uses a subset of utterances that we create; this is only
+# needed by programs that use the subset.
+spk2utt_opt=--spk2utt=ark:$dir/spk2utt
+# the utt2spk opt is used by programs that use all the data so give
+# it the original utt2spk file.
+utt2spk_opt=--utt2spk=ark:data/train.utt2spk
+normtype=mean # et option; could be mean, or none
+
+numiters=30    # Number of iterations of training
+maxiterinc=20 # Last iter to increase #Gauss on.
+numiters_et=15 # Before this, update et.
+numleaves=1500
+numgauss=$numleaves
+totgauss=7000 # Target #Gaussians
+incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
+realign_iters="10 15 20 25";
+silphonelist=`cat data/silphones.csl`
+
+nutt=15 # Use at most 15 utterances from each speaker for
+# estimating transforms, and A and B (will use all the data
+# for estimating the model though, so be careful: we're
+# not always using the lists in $dir).
+
+mkdir -p $dir
+cp $srcdir/topo $dir
+
+
+awk '{ printf("%s ",$1); for(x=2; x<=NF&&x<='$nutt'+1;x++)
+    {  printf("%s ", $x); } printf("\n"); }' <data/train.spk2utt >$dir/spk2utt
+scripts/spk2utt_to_utt2spk.pl < $dir/spk2utt > $dir/utt2spk
+cat $dir/utt2spk | awk '{print $1}' > $dir/uttlist
+scripts/filter_scp.pl $dir/uttlist <data/train.scp >$dir/train.scp
+
+
+origfeats="ark,s,cs:add-deltas scp:data/train.scp ark:- |"
+# The following two variables will get changed in the script.
+feats="$origfeats"
+
+
+echo "aligning all training data"
+gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel "$srcgraphs" \
+       "$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+
+acc-tree-stats  --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali \
+    $dir/treeacc 2> $dir/acc.tree.log  || exit 1;
+
+cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
+cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
+scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
+compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
+
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+
+build-tree --verbose=1 --max-leaves=$numleaves \
+    $dir/treeacc $dir/roots.txt \
+    $dir/questions.qst $dir/topo $dir/tree  2> $dir/train_tree.log || exit 1;
+
+gmm-init-model  --write-occs=$dir/1.occs  \
+    $dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
+
+# Convert alignments generated from previous model, to use as initial alignments.
+
+rm $dir/treeacc
+
+convert-ali  $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 
+   2>$dir/convert.log  || exit 1
+
+rm $dir/0.ali
+
+# Make training graphs
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/1.mdl  data/L.fst ark:data/train.tra \
+   "ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1 
+
+gmm-init-et --normalize-type=$normtype --binary=false --dim=$dim $dir/1.et 2>$dir/init_et.log || exit 1
+
+x=1
+while [ $x -lt $numiters ]; do
+   x1=$[$x+1]; 
+   echo pass $x
+   if echo $realign_iters | grep -w $x >/dev/null; then
+     echo "Aligning data"
+     gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
+             "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
+             ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
+   fi
+
+   if [ $x -lt $numiters_et ]; then
+     # Work out current transforms:
+   ( ali-to-post ark:$dir/cur.ali ark:- | \
+     weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
+     gmm-post-to-gpost $dir/$x.mdl "$feats" ark:- ark:- | \
+     gmm-est-et $spk2utt_opt --verbose=1 $dir/$x.mdl $dir/$x.et "$origfeats" \
+        ark,s,cs:- ark:$dir/$x.trans ark,t:$dir/$x.warp ) 2> $dir/trans.$x.log || exit 1;
+  
+     # Remove previous transforms, if present. 
+     if [ $x -gt 1 ]; then rm $dir/$[$x-1].trans; fi
+
+     # Now change $feats to correspond to the transformed features. 
+     feats="ark:add-deltas scp:data/train.scp ark:- | transform-feats $utt2spk_opt ark:$dir/$x.trans ark:- ark:- |"
+   fi 
+
+   # Accumulate stats to update model:
+   gmm-acc-stats-ali $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2>$dir/gmm_acc.$x.log || exit 1;
+   # Update model.
+   gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$x1.mdl \
+        2>$dir/gmm_est.$x.log || exit 1;
+
+   rm $dir/$x.acc $dir/$x.mdl
+
+
+   if [ $x -lt $numiters_et ]; then
+     # Alternately estimate either A or B.
+     if [ $[$x%2] == 0 ]; then  # Estimate A:
+     ( ali-to-post ark:$dir/cur.ali ark:- | \
+       weight-silence-post 0.0 $silphonelist $dir/$x1.mdl ark:- ark:- | \
+       gmm-post-to-gpost $dir/$x1.mdl "$feats" ark:- ark:- | \
+       gmm-et-acc-a $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$origfeats" ark,s,cs:- $dir/$x.et_acc_a ) 2> $dir/acc_a.$x.log || exit 1;
+       gmm-et-est-a --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.et_acc_a 2> $dir/update_a.$x.log || exit 1;
+       rm $dir/$x.et_acc_a
+     else
+     ( ali-to-post ark:$dir/cur.ali ark:- | \
+       weight-silence-post 0.0 $silphonelist $dir/$x1.mdl ark:- ark:- | \
+       gmm-post-to-gpost $dir/$x1.mdl "$feats" ark:- ark:- | \
+       gmm-et-acc-b $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$origfeats" ark,s,cs:- ark:$dir/$x.trans ark:$dir/$x.warp $dir/$x.et_acc_b ) 2> $dir/acc_b.$x.log || exit 1;
+       gmm-et-est-b --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.mat $dir/$x.et_acc_b 2> $dir/update_b.$x.log || exit 1;
+       rm $dir/$x.et_acc_b
+       # Careful!: gmm-transform-means here changes $x1.mdl in-place. 
+       gmm-transform-means $dir/$x.mat $dir/$x1.mdl $dir/$x1.mdl 2> $dir/transform_means.$x.log
+     fi   
+   fi
+   if [ $x -le $maxiterinc ]; then
+      numgauss=$[$numgauss+$incgauss];
+   fi
+   x=$[$x+1];
+done
+
+
+# Accumulate stats for "alignment model" which is as the model but with
+# the baseline features (shares Gaussian-level alignments).
+
+gmm-et-get-b $dir/$numiters_et.et $dir/default.mat 2>$dir/get_b.log || exit 1
+
+defaultfeats="ark,s,cs:add-deltas scp:data/train.scp ark:- | transform-feats $dir/default.mat ark:- ark:- |"
+
+( ali-to-post ark:$dir/cur.ali ark:- | \
+  gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$defaultfeats" ark:- $dir/$x.acc2 ) 2>$dir/acc_alimdl.log || exit 1;
+  # Update model.
+gmm-est  --remove-low-count-gaussians=false $dir/$x.mdl $dir/$x.acc2 $dir/$x.alimdl \
+      2>$dir/est_alimdl.log  || exit 1;
+rm $dir/$x.acc2
+
+
+# The following files may be useful for display purposes.
+for n in  1 2 3 4 5 6 7 8 9 10 11 12 13 14; do
+ cat $dir/$n.warp | scripts/process_warps.pl data/spk2gender.map > $dir/warps.$n
+done
+
+( cd $dir; rm final.mdl 2>/dev/null; 
+  ln -s $x.mdl final.mdl; ln -s $x.alimdl final.alimdl;
+  ln -s $numiters_et.et final.et )
+
--- a/egs/rm/s1/steps/train_tri2c.sh
+++ b/egs/rm/s1/steps/train_tri2c.sh
@ -0,0 +1,123 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To be run from ..
+
+# This (train_tri2c) is training with mean normalization (you could
+# modify options in this script to do variance normalization).
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/tri2c
+srcdir=exp/tri1
+srcmodel=$srcdir/final.mdl
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+mkdir -p $dir
+cp $srcdir/topo $dir
+norm_vars=false
+after_deltas=false
+per_spk=true
+
+numiters=30    # Number of iterations of training
+maxiterinc=20 # Last iter to increase #Gauss on.
+numiters_et=15 # Before this, update et.
+numleaves=1500
+numgauss=$numleaves
+totgauss=7000 # Target #Gaussians
+incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
+realign_iters="10 15 20 25";
+
+silphonelist=`cat data/silphones.csl`
+
+if [ $per_spk == "true" ]; then
+  spk2utt_opt=--spk2utt=ark:data/train.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/train.utt2spk
+else
+  spk2utt_opt=
+  utt2spk_opt=
+fi
+
+srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+
+echo "Computing cepstral mean and variance stats."
+# compute mean and variance stats.
+if [ $after_deltas == true ]; then
+  compute-cmvn-stats $spk2utt_opt "$srcfeats" ark:$dir/cmvn.ark 2>$dir/cmvn.log
+  feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- | apply-cmvn --norm-vars=$norm_vars $utt2spk_opt ark:$dir/cmvn.ark ark:- ark:- |"
+else 
+  compute-cmvn-stats --spk2utt=ark:data/train.spk2utt scp:data/train.scp \
+      ark:$dir/cmvn.ark 2>$dir/cmvn.log
+  feats="ark:apply-cmvn --norm-vars=$norm_vars $utt2spk_opt ark:$dir/cmvn.ark scp:data/train.scp ark:- | add-deltas --print-args=false ark:- ark:- |"
+fi
+
+
+echo "aligning all training data"
+gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel \
+  "$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+
+acc-tree-stats  --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log  || exit 1;
+
+
+cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
+cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
+scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
+compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
+
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+
+build-tree --verbose=1 --max-leaves=$numleaves \
+    $dir/treeacc $dir/roots.txt \
+    $dir/questions.qst $dir/topo $dir/tree  2> $dir/train_tree.log || exit 1;
+
+gmm-init-model  --write-occs=$dir/1.occs  \
+    $dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
+
+
+# Convert alignments generated from previous model, to use as initial alignments.
+
+rm $dir/treeacc
+
+convert-ali  $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log 
+  # Debug step only: convert back and check they're the same.
+  convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
+   2>/dev/null | cmp - $dir/0.ali || exit 1; 
+
+rm $dir/0.ali
+
+# Make training graphs
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/1.mdl  data/L.fst ark:data/train.tra ark:$dir/graphs.fsts \
+    2>$dir/compile_graphs.log || exit 1 
+
+x=1
+while [ $x -lt $numiters ]; do
+   echo pass $x
+   if echo $realign_iters | grep -w $x >/dev/null; then
+     echo "Aligning data"
+     gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl ark:$dir/graphs.fsts "$feats" \
+             ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
+   fi
+   gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log  || exit 1;
+   gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
+   rm $dir/$x.mdl $dir/$x.acc
+   if [ $x -le $maxiterinc ]; then
+      numgauss=$[$numgauss+$incgauss];
+   fi
+   x=$[$x+1];
+done
+
+( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl )
--- a/egs/rm/s1/steps/train_tri2d.sh
+++ b/egs/rm/s1/steps/train_tri2d.sh
@ -0,0 +1,120 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To be run from ..
+
+# This (train_tri2d) is training with standard delta+delta-delta features
+# plus MLLT.
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/tri2d
+srcdir=exp/tri1
+srcmodel=$srcdir/final.mdl
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+
+numiters=30    # Number of iterations of training
+maxiterinc=20 # Last iter to increase #Gauss on.
+numleaves=1500
+numgauss=$numleaves
+totgauss=7000 # Target #Gaussians
+incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
+silphonelist=`cat data/silphones.csl`
+realign_iters="10 15 20 25";  
+mllt_iters="2 4 6 12";
+
+feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+# Subset of features used to train MLLT transform.
+featsub="ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --print-args=false scp:- ark:- |"
+
+mkdir -p $dir
+cp $srcdir/topo $dir
+
+echo "aligning all training data"
+gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel \
+    "$srcgraphs" "$feats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+
+acc-tree-stats  --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log  || exit 1;
+
+
+cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
+cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
+scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
+compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
+
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+
+build-tree --verbose=1 --max-leaves=$numleaves \
+    $dir/treeacc $dir/roots.txt \
+    $dir/questions.qst $dir/topo $dir/tree  2> $dir/train_tree.log || exit 1;
+
+gmm-init-model  --write-occs=$dir/1.occs  \
+    $dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
+
+rm $dir/treeacc
+
+# Convert alignments generated from monophone model, to use as initial alignments.
+convert-ali  $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log 
+  # Debug step only: convert back and check they're the same.
+  convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
+   2>/dev/null | cmp - $dir/0.ali || exit 1; 
+
+rm $dir/0.ali
+
+# Make training graphs
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/1.mdl  data/L.fst ark:data/train.tra \
+    "ark:|gzip -c >$dir/graphs.fsts.gz"  2>$dir/compile_graphs.log || exit 1 
+
+cur_mllt=""
+x=1
+while [ $x -lt $numiters ]; do
+   echo pass $x
+   if echo $realign_iters | grep -w $x >/dev/null; then
+     echo "Aligning data"
+     gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
+         "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
+         ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
+   fi
+   if echo $mllt_iters | grep -w $x >/dev/null; then # Do MLLT update.
+     ( ali-to-post ark:$dir/cur.ali ark:- | \
+       weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
+       gmm-acc-mllt --binary=false $dir/$x.mdl "$featsub" ark:- $dir/$x.macc ) 2> $dir/macc.$x.log  || exit 1;
+     if [ "$cur_mllt" != "" ]; then 
+       est-mllt $dir/$x.mat.new $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
+       gmm-transform-means --binary=false $dir/$x.mat.new $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
+       compose-transforms --print-args=false $dir/$x.mat.new $cur_mllt $dir/$x.mat || exit 1;
+     else
+       est-mllt $dir/$x.mat $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
+       gmm-transform-means --binary=false $dir/$x.mat $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
+     fi
+     cur_mllt=$dir/$x.mat
+     feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- | transform-feats $cur_mllt ark:- ark:- |" 
+     featsub="ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --print-args=false scp:- ark:- | transform-feats $cur_mllt ark:- ark:- |"
+   else # do GMM update.
+     gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log  || exit 1;
+     gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
+   fi
+   rm $dir/$x.mdl $dir/$x.acc
+   if [ $x -le $maxiterinc ]; then 
+      numgauss=$[$numgauss+$incgauss];
+   fi
+   x=$[$x+1]
+done
+
+( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
+  ln -s `basename $cur_mllt` final.mat )
--- a/egs/rm/s1/steps/train_tri2e.sh
+++ b/egs/rm/s1/steps/train_tri2e.sh
@ -0,0 +1,111 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To be run from ..
+
+# This (train_tri2e) is training with splice-9-frames+LDA features.
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/tri2e
+srcdir=exp/tri1
+srcmodel=$srcdir/final.mdl
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+numiters=30    # Number of iterations of training
+maxiterinc=20 # Last iter to increase #Gauss on.
+numleaves=1500
+numgauss=$numleaves
+totgauss=7000 # Target #Gaussians
+incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
+silphonelist=`cat data/silphones.csl`
+realign_iters="10 15 20 25";  
+
+# feats corresponding to orignal model
+srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/lda.mat ark:- ark:-|"
+# Subset of features used to train LDA transforms.
+featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $dir/lda.mat ark:- ark:-|"
+
+mkdir -p $dir
+cp $srcdir/topo $dir
+
+echo "aligning all training data"
+
+gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel \
+   "$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+
+ (ali-to-post ark:$dir/0.ali ark:- | \
+   weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
+   acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
+       ark:- $dir/lda.acc ) 2>$dir/lda_acc.log || exit 1
+est-lda $dir/lda.mat $dir/lda.acc 2>$dir/lda_est.log || exit 1
+
+
+acc-tree-stats  --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log  || exit 1;
+
+
+cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
+cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
+scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
+compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
+
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+
+build-tree --verbose=1 --max-leaves=$numleaves \
+    $dir/treeacc $dir/roots.txt \
+    $dir/questions.qst $dir/topo $dir/tree  2> $dir/train_tree.log || exit 1;
+
+gmm-init-model  --write-occs=$dir/1.occs  \
+    $dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
+
+rm $dir/treeacc
+
+# Convert alignments generated from monophone model, to use as initial alignments.
+
+convert-ali  $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log 
+  # Debug step only: convert back and check they're the same.
+  convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
+   2>/dev/null | cmp - $dir/0.ali || exit 1; 
+
+rm $dir/0.ali
+
+# Make training graphs
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/1.mdl  data/L.fst ark:data/train.tra \
+   "ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1 
+
+x=1
+while [ $x -lt $numiters ]; do
+   echo pass $x
+   if echo $realign_iters | grep -w $x >/dev/null; then
+     echo "Aligning data"
+     gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
+       "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
+       ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
+   fi
+
+   gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log  || exit 1;
+   gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
+   rm $dir/$x.mdl $dir/$x.acc
+   if [ $x -le $maxiterinc ]; then 
+      numgauss=$[$numgauss+$incgauss];
+   fi
+   x=$[$x+1]
+done
+
+( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
+  ln -s lda.mat final.mat )
--- a/egs/rm/s1/steps/train_tri2f.sh
+++ b/egs/rm/s1/steps/train_tri2f.sh
@ -0,0 +1,130 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To be run from ..
+
+# This (train_tri2f) is training with splice-9-frames+LDA features,
+# plus MLLT.
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/tri2f
+srcdir=exp/tri1
+srcmodel=$srcdir/final.mdl
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+numiters=30    # Number of iterations of training
+maxiterinc=20 # Last iter to increase #Gauss on.
+numleaves=1500
+numgauss=$numleaves
+totgauss=7000 # Target #Gaussians
+incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
+silphonelist=`cat data/silphones.csl`
+realign_iters="10 15 20 25";  
+mllt_iters="2 4 6 12";
+
+# feats corresponding to orignal model
+srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
+# Subset of features used to train LDA and MLLT transforms.
+featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $dir/0.mat ark:- ark:-|"
+
+mkdir -p $dir
+cp $srcdir/topo $dir
+
+echo "aligning all training data"
+
+gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel \
+   "$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+
+( ali-to-post ark:$dir/0.ali ark:- | \
+   weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
+   acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
+       ark:- $dir/lda.acc ) 2>$dir/lda_acc.log
+est-lda $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
+
+
+acc-tree-stats  --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log  || exit 1;
+
+
+cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
+cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
+scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
+compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
+
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+
+build-tree --verbose=1 --max-leaves=$numleaves \
+    $dir/treeacc $dir/roots.txt \
+    $dir/questions.qst $dir/topo $dir/tree  2> $dir/train_tree.log || exit 1;
+
+gmm-init-model  --write-occs=$dir/1.occs  \
+    $dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
+
+rm $dir/treeacc
+
+# Convert alignments generated from monophone model, to use as initial alignments.
+
+convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log 
+  # Debug step only: convert back and check they're the same.
+  convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
+   2>/dev/null | cmp - $dir/0.ali || exit 1; 
+
+rm $dir/0.ali
+
+# Make training graphs
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/1.mdl  data/L.fst ark:data/train.tra \
+    "ark:|gzip -c >$dir/graphs.fsts.gz"  2>$dir/compile_graphs.log || exit 1 
+
+cur_lda=$dir/0.mat
+x=1
+while [ $x -lt $numiters ]; do
+   echo pass $x
+   if echo $realign_iters | grep -w $x >/dev/null; then
+     echo "Aligning data"
+     gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
+            "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
+             ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
+   fi
+   if echo $mllt_iters | grep -w $x >/dev/null; then # Do MLLT update.
+    ( ali-to-post ark:$dir/cur.ali ark:- | \
+       weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
+       gmm-acc-mllt --binary=false $dir/$x.mdl "$featsub" ark:- $dir/$x.macc ) 2> $dir/macc.$x.log  || exit 1;
+
+     est-mllt $dir/$x.mat.new $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
+     gmm-transform-means --binary=false $dir/$x.mat.new $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
+     compose-transforms --print-args=false $dir/$x.mat.new $cur_lda $dir/$x.mat || exit 1;
+     cur_lda=$dir/$x.mat
+
+
+     feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
+     # Subset of features used to train MLLT transforms.
+     featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $cur_lda ark:- ark:-|"
+   else # do GMM update.
+     gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log  || exit 1;
+     gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
+   fi
+   rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
+   if [ $x -le $maxiterinc ]; then 
+      numgauss=$[$numgauss+$incgauss];
+   fi
+   x=$[$x+1]
+done
+
+( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
+  ln -s `basename $cur_lda` final.mat )
+
--- a/egs/rm/s1/steps/train_tri2g.sh
+++ b/egs/rm/s1/steps/train_tri2g.sh
@ -0,0 +1,205 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To be run from ..
+
+# This (train_tri2g) is training with linear-VTLN (lvtln)
+# which a linear approximation to VTLN.
+# At the end, it also converts this in a single-pass retraining
+# manner to a normal feature-level VTLN model.
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/tri2g
+srcdir=exp/tri1
+srcmodel=$srcdir/final.mdl
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+numiters=30    # Number of iterations of training
+maxiterinc=20 # Last iter to increase #Gauss on.
+numleaves=1500
+numgauss=$numleaves
+totgauss=7000 # Target #Gaussians
+incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
+silphonelist=`cat data/silphones.csl`
+realign_iters="10 15 20 25";  
+lvtln_iters="2 4 6 8 12"; # Recompute LVTLN transforms on these iters.
+per_spk=true
+compute_vtlnmdl=true # If true, at the end compute a model with actual feature-space
+                     # VTLN features.  You can decode with this as an alternative to
+                     # final.mdl which takes the LVTLN features.
+
+numfiles=40 # Number of feature files for computing LVTLN transforms.
+numclass=31; # Can't really change this without changing the script below
+defaultclass=15; # Corresponds to no warping.
+# RE "vtln_warp"
+
+
+if [ $per_spk == "true" ]; then
+  spk2utt_opt=--spk2utt=ark:data/train.spk2utt
+  utt2spk_opt=--utt2spk=ark:data/train.utt2spk
+else
+  spk2utt_opt=
+  utt2spk_opt=
+fi
+
+mkdir -p $dir
+cp $srcdir/topo $dir
+
+
+srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+# Will create lvtln.trans below...
+feats="ark:add-deltas --print-args=false scp:data/train.scp ark:- | transform-feats $utt2spk_opt ark:$dir/cur.trans ark:- ark:- |"
+
+gmm-init-lvtln --dim=39 --num-classes=$numclass --default-class=$defaultclass \
+      $dir/0.lvtln 2>$dir/init_lvtln.log || exit 1
+
+featsub="ark:scripts/subset_scp.pl $numfiles data/train.scp | add-deltas scp:- ark:- |"
+
+echo "Initializing lvtln transforms."
+c=0
+while [ $c -lt $numclass ]; do 
+  warp=`perl -e 'print 0.85 + 0.01*$ARGV[0];' $c` 
+  featsub_warp="ark:scripts/subset_scp.pl $numfiles data_prep/train_wav.scp | compute-mfcc-feats  --vtln-low=100 --vtln-high=-600 --vtln-warp=$warp --config=conf/mfcc.conf scp:- ark:- | add-deltas ark:- ark:- |"
+  gmm-train-lvtln-special --normalize-var=true $c $dir/0.lvtln $dir/0.lvtln \
+    "$featsub" "$featsub_warp" 2> $dir/train_special.$c.log || exit 1;
+  c=$[$c+1]
+done
+
+
+
+# just a single element. :-separated integer list of context-independent
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+# script below tells it not to cluster, but here we avoid accumulating
+# CD-stats for silence.
+
+echo "aligning all training data"
+gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel \
+  "$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+
+
+echo "Computing LVTLN transforms (iter 0)"
+( ali-to-post ark:$dir/0.ali  ark:- | \
+  weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
+  gmm-post-to-gpost $srcmodel "$srcfeats" ark:- ark:- | \
+  gmm-est-lvtln-trans --verbose=1 $spk2utt_opt $srcmodel $dir/0.lvtln \
+    "$srcfeats" ark:- ark:$dir/cur.trans ark,t:$dir/0.warp ) 2>$dir/lvtln.0.log || exit 1
+
+acc-tree-stats  --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log  || exit 1;
+
+
+cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
+cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
+scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
+compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
+
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+
+build-tree --verbose=1 --max-leaves=$numleaves \
+    $dir/treeacc $dir/roots.txt \
+    $dir/questions.qst $dir/topo $dir/tree  2> $dir/train_tree.log || exit 1;
+
+gmm-init-model  --write-occs=$dir/1.occs  \
+    $dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
+
+rm $dir/treeacc
+
+# Convert alignments generated from monophone model, to use as initial alignments.
+
+convert-ali  $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log 
+  # Debug step only: convert back and check they're the same.
+  convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
+   2>/dev/null | cmp - $dir/0.ali || exit 1; 
+
+rm $dir/0.ali
+
+
+# Make training graphs
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/1.mdl  data/L.fst ark:data/train.tra \
+   "ark:|gzip -c > $dir/graphs.fsts.gz"  2>$dir/compile_graphs.log || exit 1 
+
+cur_lvtln=$dir/0.lvtln
+x=1
+while [ $x -lt $numiters ]; do
+   echo pass $x
+   if echo $lvtln_iters | grep -w $x >/dev/null; then
+   ( ali-to-post ark:$dir/cur.ali  ark:- | \
+     weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
+     gmm-post-to-gpost $dir/$x.mdl "$feats" ark:- ark:- | \
+     gmm-est-lvtln-trans --verbose=1 $spk2utt_opt $dir/$x.mdl $dir/0.lvtln \
+      "$srcfeats" ark:- ark:$dir/tmp.trans ark,t:$dir/$x.warp ) 2>$dir/lvtln.$x.log || exit 1
+     cp $dir/$x.warp $dir/cur.warp
+     mv $dir/tmp.trans $dir/cur.trans
+   fi
+   if echo $realign_iters | grep -w $x >/dev/null; then
+     echo "Aligning data"
+     gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
+             "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
+             ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
+   fi
+   gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log  || exit 1;
+   gmm-est --write-occs=$dir/$[$x+1].occs --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
+   rm $dir/$x.mdl $dir/$x.acc
+   if [ $x -le $maxiterinc ]; then 
+      numgauss=$[$numgauss+$incgauss];
+   fi
+   x=$[$x+1]
+done
+
+# Accumulate stats for "alignment model" which is as the model but with
+# the baseline features (shares Gaussian-level alignments).
+( ali-to-post ark:$dir/cur.ali ark:-  | \
+  gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$srcfeats" ark:- $dir/$x.acc2 ) 2>$dir/acc_alimdl.log || exit 1;
+  # Update model.
+gmm-est  --remove-low-count-gaussians=false $dir/$x.mdl $dir/$x.acc2 $dir/$x.alimdl \
+      2>$dir/est_alimdl.log  || exit 1;
+rm $dir/$x.acc2
+
+
+# The following files contains information that may be useful for display purposes
+
+for n in 0 $lvtln_iters; do
+ cat $dir/$n.warp | scripts/process_warps.pl data/spk2gender.map > $dir/warps.$n
+done
+
+if [ $compute_vtlnmdl == "true" ]; then
+   cat $dir/cur.warp | awk '{print $1, (0.85+0.01*$2);}' > $dir/cur.factor  
+   compute-mfcc-feats $utt2spk_opt --vtln-low=100 --vtln-high=-600 --vtln-map=ark:$dir/cur.factor --config=conf/mfcc.conf scp:data_prep/train_wav.scp ark:$dir/tmp.ark 2>$dir/mfcc.log
+   vtlnfeats="ark:add-deltas ark:$dir/tmp.ark ark:- |"
+
+   # Compute diagonal fMLLR transform to normalize VTLN feats.
+  ( ali-to-post ark:$dir/cur.ali ark:-  | \
+    weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
+    gmm-est-fmllr --fmllr-update-type=diag $spk2utt_opt $dir/$x.mdl "$vtlnfeats" ark,o:- ark:$dir/vtln.trans ) 2>$dir/vtln_fmllr.log  || exit 1;
+
+   vtlnfeats="ark:add-deltas ark:$dir/tmp.ark ark:- | transform-feats $utt2spk_opt ark:$dir/vtln.trans ark:- ark:- |"
+
+  ( ali-to-post ark:$dir/cur.ali ark:-  | \
+    gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$vtlnfeats" ark:- $dir/$x.acc3 ) 2>$dir/acc_vtlnmdl.log || exit 1;
+  # Update model.
+  gmm-est  $dir/$x.mdl $dir/$x.acc3 $dir/$x.vtlnmdl \
+      2>$dir/est_vtlnmdl.log  || exit 1;
+  rm $dir/$x.acc3
+  ln -s $x.vtlnmdl $dir/final.vtlnmdl
+  rm $dir/tmp.ark
+fi
+
+
+( cd $dir; rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
+  ln -s $x.alimdl final.alimdl;
+  ln -s 0.lvtln final.lvtln;
+  ln -s cur.trans final.trans )
--- a/egs/rm/s1/steps/train_tri2h.sh
+++ b/egs/rm/s1/steps/train_tri2h.sh
@ -0,0 +1,123 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To be run from ..
+
+# This (train_tri2h) is training with splice-9-frames+HLDA features.
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/tri2h
+srcdir=exp/tri1
+srcmodel=$srcdir/final.mdl
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+
+numiters=30    # Number of iterations of training
+maxiterinc=20 # Last iter to increase #Gauss on.
+numleaves=1500
+numgauss=$numleaves
+totgauss=7000 # Target #Gaussians
+incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
+realign_iters="10 15 20 25";  
+hlda_iters="2 4 6 12";
+silphonelist=`cat data/silphones.csl`
+
+mkdir -p $dir
+cp $srcdir/topo $dir
+
+# feats corresponding to orignal model
+srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
+rawfeats="ark:splice-feats scp:data/train.scp ark:- |"
+# The "speedup" parameter controls how much of the data to use
+# in the most intensive part of the HLDA transform computation.
+speedup=0.1
+
+echo "aligning all training data"
+
+gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel \
+  "$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+
+ali-to-post ark:$dir/0.ali ark:- | \
+   weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
+   acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
+       ark:- $dir/lda.acc 2>$dir/lda_acc.log
+est-lda --write-full-matrix=$dir/0.fullmat $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
+
+acc-tree-stats  --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log  || exit 1;
+
+cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
+cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
+scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
+compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
+
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+
+build-tree --verbose=1 --max-leaves=$numleaves \
+    $dir/treeacc $dir/roots.txt \
+    $dir/questions.qst $dir/topo $dir/tree  2> $dir/train_tree.log || exit 1;
+
+gmm-init-model  --write-occs=$dir/1.occs  \
+    $dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
+
+# Convert alignments generated from monophone model, to use as initial alignments.
+
+convert-ali  $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log 
+  # Debug step only: convert back and check they're the same.
+  convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
+   2>/dev/null | cmp - $dir/0.ali || exit 1; 
+
+# Make training graphs
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/1.mdl  data/L.fst ark:data/train.tra \
+   "ark:| gzip -c > $dir/graphs.fsts.gz" \
+    2>$dir/compile_graphs.log || exit 1 
+
+cur_mat_iter=0
+x=1
+while [ $x -lt $numiters ]; do
+   echo pass $x
+   if echo $realign_iters | grep -w $x >/dev/null; then
+     echo "Aligning data"
+     gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
+          "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
+           ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
+   fi
+   if echo $hlda_iters | grep -w $x >/dev/null; then # Do HLDA update.
+     ( ali-to-post ark:$dir/cur.ali ark:- | \
+       weight-silence-post 0.01 $silphonelist $dir/$x.mdl ark:- ark:- | \
+       gmm-acc-hlda --speedup=$speedup --binary=false $dir/$x.mdl $dir/$cur_mat_iter.mat "$rawfeats" ark:- $dir/$x.hacc ) 2> $dir/hacc.$x.log  || exit 1;
+
+     gmm-est-hlda $dir/$x.mdl $dir/$cur_mat_iter.fullmat $dir/$[$x+1].mdl $dir/$x.fullmat $dir/$x.mat $dir/$x.hacc 2> $dir/hupdate.$x.log || exit 1;
+     cur_mat_iter=$x 
+
+     feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/$cur_mat_iter.mat ark:- ark:-|"
+   else # do GMM update.
+     gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log  || exit 1;
+     gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
+   fi
+   rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
+   if [ $x -le $maxiterinc ]; then 
+      numgauss=$[$numgauss+$incgauss];
+   fi
+   x=$[$x+1]
+done
+
+( cd $dir;
+   rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
+   rm final.mat 2>/dev/null; ln -s $cur_mat_iter.mat final.mat )
+
--- a/egs/rm/s1/steps/train_tri2i.sh
+++ b/egs/rm/s1/steps/train_tri2i.sh
@ -0,0 +1,127 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To be run from ..
+
+# This (train_tri2i) is training with triple-deltas+HLDA features.
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/tri2i
+srcdir=exp/tri1
+srcmodel=$srcdir/final.mdl
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+
+numiters=30    # Number of iterations of training
+maxiterinc=20 # Last iter to increase #Gauss on.
+numleaves=1500
+numgauss=$numleaves
+totgauss=7000 # Target #Gaussians
+incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
+realign_iters="10 15 20 25";  
+hlda_iters="2 4 6 12";
+silphonelist=`cat data/silphones.csl`
+
+mkdir -p $dir
+cp $srcdir/topo $dir
+
+# feats corresponding to orignal model
+srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+feats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
+rawfeats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- |"
+# The "speedup" parameter controls how much of the data to use
+# in the most intensive part of the HLDA transform computation.
+speedup=0.1
+
+echo "aligning all training data"
+
+gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel \
+  "$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+
+( ali-to-post ark:$dir/0.ali ark:- | \
+   weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
+   acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | \
+   add-deltas --delta-order=3 scp:- ark:- |" \
+     ark:- $dir/lda.acc )  2>$dir/lda_acc.log
+est-lda --write-full-matrix=$dir/0.fullmat $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
+
+
+acc-tree-stats  --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log  || exit 1;
+
+
+
+cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
+cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
+scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
+compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
+
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+
+build-tree --verbose=1 --max-leaves=$numleaves \
+    $dir/treeacc $dir/roots.txt \
+    $dir/questions.qst $dir/topo $dir/tree  2> $dir/train_tree.log || exit 1;
+
+gmm-init-model  --write-occs=$dir/1.occs  \
+    $dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
+
+# Convert alignments generated from monophone model, to use as initial alignments.
+
+convert-ali  $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log 
+  # Debug step only: convert back and check they're the same.
+  convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
+   2>/dev/null | cmp - $dir/0.ali || exit 1; 
+
+# Make training graphs
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/1.mdl  data/L.fst ark:data/train.tra \
+   "ark:| gzip -c > $dir/graphs.fsts.gz" \
+    2>$dir/compile_graphs.log || exit 1 
+
+cur_mat_iter=0
+x=1
+while [ $x -lt $numiters ]; do
+   echo pass $x
+   if echo $realign_iters | grep -w $x >/dev/null; then
+     echo "Aligning data"
+     gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
+          "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
+           ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
+   fi
+   if echo $hlda_iters | grep -w $x >/dev/null; then # Do HLDA update.
+     ( ali-to-post ark:$dir/cur.ali ark:- | \
+       weight-silence-post 0.01 $silphonelist $dir/$x.mdl ark:- ark:- | \
+       gmm-acc-hlda --speedup=$speedup --binary=false $dir/$x.mdl $dir/$cur_mat_iter.mat "$rawfeats" ark:- $dir/$x.hacc ) 2> $dir/hacc.$x.log  || exit 1;
+
+     gmm-est-hlda $dir/$x.mdl $dir/$cur_mat_iter.fullmat $dir/$[$x+1].mdl $dir/$x.fullmat $dir/$x.mat $dir/$x.hacc 2> $dir/hupdate.$x.log || exit 1;
+     cur_mat_iter=$x 
+
+     feats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- | transform-feats $dir/$cur_mat_iter.mat ark:- ark:-|"
+   else # do GMM update.
+     gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log  || exit 1;
+     gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
+   fi
+   rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
+   if [ $x -le $maxiterinc ]; then 
+      numgauss=$[$numgauss+$incgauss];
+   fi
+   x=$[$x+1]
+done
+
+( cd $dir;
+   rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
+   rm final.mat 2>/dev/null; ln -s $cur_mat_iter.mat final.mat )
+
--- a/egs/rm/s1/steps/train_tri2j.sh
+++ b/egs/rm/s1/steps/train_tri2j.sh
@ -0,0 +1,231 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To be run from ..
+
+# This (train_tri2j) is training with triple-deltas+LDA+MLLT.
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/tri2j
+srcdir=exp/tri1
+srcmodel=$srcdir/final.mdl
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+
+numiters=30    # Number of iterations of training
+maxiterinc=20 # Last iter to increase #Gauss on.
+numleaves=1500
+numgauss=$numleaves
+totgauss=7000 # Target #Gaussians
+incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
+realign_iters="10 15 20 25";  
+mllt_iters="2 4 6 12";
+silphonelist=`cat data/silphones.csl`
+
+mkdir -p $dir
+cp $srcdir/topo $dir
+
+# feats corresponding to orignal model
+srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+feats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
+# Subset of features used to train LDA and MLLT transforms.
+featsub="ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --delta-order=3 scp:- ark:- | transform-feats $dir/0.mat ark:- ark:-|"
+
+echo "aligning all training data"
+
+gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel \
+    "$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+
+( ali-to-post ark:$dir/0.ali ark:- | \
+   weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
+   acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --delta-order=3 scp:- ark:- |" \
+       ark:- $dir/lda.acc ) 2>$dir/lda_acc.log
+est-lda $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
+
+
+acc-tree-stats  --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log  || exit 1;
+
+
+
+cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
+cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
+scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
+compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
+
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+
+build-tree --verbose=1 --max-leaves=$numleaves \
+    $dir/treeacc $dir/roots.txt \
+    $dir/questions.qst $dir/topo $dir/tree  2> $dir/train_tree.log || exit 1;
+
+gmm-init-model  --write-occs=$dir/1.occs  \
+    $dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
+
+# Convert alignments generated from monophone model, to use as initial alignments.
+
+convert-ali  $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log 
+  # Debug step only: convert back and check they're the same.
+  convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
+   2>/dev/null | cmp - $dir/0.ali || exit 1; 
+
+# Make training graphs
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/1.mdl  data/L.fst ark:data/train.tra \
+   "ark:| gzip -c > $dir/graphs.fsts.gz" \
+    2>$dir/compile_graphs.log || exit 1 
+
+cur_lda=$dir/0.mat
+x=1
+while [ $x -lt $numiters ]; do
+   echo pass $x
+   if echo $realign_iters | grep -w $x >/dev/null; then
+     echo "Aligning data"
+     gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
+          "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
+           ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
+   fi
+   if echo $mllt_iters | grep -w $x >/dev/null; then # Do MLLT update.
+     ( ali-to-post ark:$dir/cur.ali ark:- | \
+       weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
+       gmm-acc-mllt --binary=false $dir/$x.mdl "$featsub" ark:- $dir/$x.macc ) 2> $dir/macc.$x.log  || exit 1;
+
+     est-mllt $dir/$x.mllt.new $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
+     gmm-transform-means --binary=false $dir/$x.mllt.new $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
+     compose-transforms $dir/$x.mllt.new $cur_lda $dir/$x.mat || exit 1;
+     cur_lda=$dir/$x.mat
+
+
+     feats="ark:add-deltas --delta-order=3 scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
+     # Subset of features used to train MLLT transforms.
+     featsub="ark:scripts/subset_scp.pl 800 data/train.scp | add-deltas --delta-order=3 scp:- ark:- | transform-feats $cur_lda ark:- ark:-|"
+   else # do GMM update.
+     gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log  || exit 1;
+     gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
+   fi
+   rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
+   if [[ $x -lt 21 ]]; then 
+      numgauss=$[$numgauss+$incgauss];
+   fi
+   if [ $x -le $maxiterinc ]; then 
+      numgauss=$[$numgauss+$incgauss];
+   fi
+   x=$[$x+1]
+done
+
+
+( cd $dir;
+   rm final.mdl 2>/dev/null; ln -s $x.mdl final.mdl;
+   rm final.mat 2>/dev/null; ln -s `basename $cur_lda` final.mat )
+=======
+#!/bin/bash
+
+# To be run from ..
+
+# This (train_tri2j) is training with splice-9-frames+HLDA features.
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/tri2j
+srcdir=exp/tri
+srcmodel=$srcdir/30.mdl
+srcgraphs=ark:$srcdir/graphs.fsts
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+mkdir -p $dir
+cp $srcdir/topo $dir
+numgauss=1500
+incgauss=275  # Inc by 275 per iter for 20 iters; 1500 + 275*20 = 7000 which is
+              # similar to the HTK baseline.
+silphonelist=`cat data/silphones.csl`
+realign_iters="10 15 20 25";  
+hlda_iters="2 4 6 12";
+
+# feats corresponding to orignal model
+srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
+rawfeats="ark:splice-feats scp:data/train.scp ark:- |"
+# The "speedup" parameter controls how much of the data to use
+# in the most intensive part of the HLDA transform computation.
+speedup=0.1
+
+if false; then
+
+echo "aligning all training data"
+
+gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel $srcgraphs "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+
+ali-to-post ark:$dir/0.ali ark:- | \
+   weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
+   acc-lda $srcmodel "ark:head -800 data/train.scp | splice-feats scp:- ark:- |" \
+       ark:- $dir/lda.acc 2>$dir/lda_acc.log
+est-lda --write-full-matrix=$dir/0.fullmat $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
+
+
+acc-tree-stats  --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log  || exit 1;
+
+
+
+cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
+cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
+scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
+compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
+
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+
+build-tree --verbose=1 --max-leaves=$numleaves \
+    $dir/treeacc $dir/roots.txt \
+    $dir/questions.qst $dir/topo $dir/tree  2> $dir/train_tree.log || exit 1;
+
+gmm-init-model  --write-occs=$dir/1.occs  \
+    $dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
+
+# Convert alignments generated from monophone model, to use as initial alignments.
+
+convert-ali  $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log 
+  # Debug step only: convert back and check they're the same.
+  convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
+   2>/dev/null | cmp - $dir/0.ali || exit 1; 
+
+# Make training graphs
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/1.mdl  data/L.fst ark:data/train.tra ark:$dir/graphs.fsts \
+    2>$dir/compile_graphs.log || exit 1 
+
+cur_mat_iter=0
+for x in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29; do
+   echo pass $x
+   if echo $realign_iters | grep -w $x >/dev/null; then
+     echo "Aligning data"
+     gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl ark:$dir/graphs.fsts "$feats" \
+             ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
+   fi
+   if echo $hlda_iters | grep -w $x >/dev/null; then # Do HLDA update.
+     ali-to-post ark:$dir/cur.ali ark:- | \
+       weight-silence-post 0.01 $silphonelist $dir/$x.mdl ark:- ark:- | \
+       gmm-acc-hlda --speedup=$speedup --binary=false $dir/$x.mdl $dir/$cur_mat_iter.mat "$rawfeats" ark:- $dir/$x.hacc 2> $dir/hacc.$x.log  || exit 1;
+
+     gmm-est-hlda $dir/$x.mdl $dir/$cur_mat_iter.fullmat $dir/$[$x+1].mdl $dir/$x.fullmat $dir/$x.mat $dir/$x.hacc 2> $dir/hupdate.$x.log || exit 1;
+     cur_mat_iter=$x 
+
+     feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/$cur_mat_iter.mat ark:- ark:-|"
+   else # do GMM update.
+     gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log  || exit 1;
+     gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
+   fi
+   rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
+   if [[ $x -lt 21 ]]; then 
+      numgauss=$[$numgauss+$incgauss];
+   fi
+done
--- a/egs/rm/s1/steps/train_tri2k.sh
+++ b/egs/rm/s1/steps/train_tri2k.sh
@ -0,0 +1,209 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To be run from ..
+
+# This (train_tri2k.sh) is training the exponential transform
+# after LDA (so the same as LDA+MLLT+ET, since ET includes
+# MLLT).
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/tri2k
+srcdir=exp/tri1
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+srcmodel=$srcdir/final.mdl
+dim=40
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+# The spk2utt_opt uses a subset of utterances that we create; this is only
+# needed by programs that use the subset.
+spk2utt_opt=--spk2utt=ark:$dir/spk2utt
+# the utt2spk opt is used by programs that use all the data so give
+# it the original utt2spk file.
+utt2spk_opt=--utt2spk=ark:data/train.utt2spk
+normtype=mean # et option; could be mean, or none
+
+numiters=30    # Number of iterations of training
+maxiterinc=20 # Last iter to increase #Gauss on.
+numiters_et=15 # Before this, update et.
+numleaves=1500
+numgauss=$numleaves
+totgauss=7000 # Target #Gaussians
+incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
+realign_iters="10 15 20 25";
+silphonelist=`cat data/silphones.csl`
+
+nutt=15 # Use at most 15 utterances from each speaker for
+# estimating transforms, and A and B (will use all the data
+# for estimating the model though, so be careful: we're
+# not always using the lists in $dir).
+
+mkdir -p $dir
+cp $srcdir/topo $dir
+
+
+awk '{ printf("%s ",$1); for(x=2; x<=NF&&x<='$nutt'+1;x++)
+    {  printf("%s ", $x); } printf("\n"); }' <data/train.spk2utt >$dir/spk2utt
+scripts/spk2utt_to_utt2spk.pl < $dir/spk2utt > $dir/utt2spk
+cat $dir/utt2spk | awk '{print $1}' > $dir/uttlist
+scripts/filter_scp.pl $dir/uttlist <data/train.scp >$dir/train.scp
+
+
+srcfeats="ark,s,cs:add-deltas scp:data/train.scp ark:- |"
+
+# For now, there is no subsetting.
+basefeats="ark,s,cs:splice-feats scp:data/train.scp ark:- | transform-feats $dir/lda.mat ark:- ark:- |"
+## The following two variables will get changed in the script.
+feats="$basefeats"
+
+
+
+echo "aligning all training data"
+gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel "$srcgraphs" \
+       "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+
+
+echo "computing LDA transform"
+( ali-to-post ark:$dir/0.ali ark:- | \
+  weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
+  acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
+    ark:- $dir/lda.acc ) 2>$dir/lda_acc.log || exit 1
+
+est-lda --dim=$dim $dir/lda.mat $dir/lda.acc 2>$dir/lda_est.log || exit 1
+
+acc-tree-stats  --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali \
+    $dir/treeacc 2> $dir/acc.tree.log  || exit 1;
+
+cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
+cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
+scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
+compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
+
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+
+
+build-tree --verbose=1 --max-leaves=$numleaves \
+    $dir/treeacc $dir/roots.txt \
+    $dir/questions.qst $dir/topo $dir/tree  2> $dir/train_tree.log || exit 1;
+
+gmm-init-model  --write-occs=$dir/1.occs  \
+    $dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
+
+# Convert alignments generated from previous model, to use as initial alignments.
+
+rm $dir/treeacc
+
+convert-ali  $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 
+   2>$dir/convert.log  || exit 1
+
+rm $dir/0.ali
+
+# Make training graphs
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/1.mdl  data/L.fst ark:data/train.tra \
+   "ark:|gzip -c >$dir/graphs.fsts.gz" 2>$dir/compile_graphs.log || exit 1 
+
+gmm-init-et --normalize-type=$normtype --binary=false --dim=$dim $dir/1.et 2>$dir/init_et.log || exit 1
+
+x=1
+while [ $x -lt $numiters ]; do
+   x1=$[$x+1]; 
+   echo pass $x
+   if echo $realign_iters | grep -w $x >/dev/null; then
+     echo "Aligning data"
+     gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
+             "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
+             ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
+   fi
+
+   if [ $x -lt $numiters_et ]; then
+     # Work out current transforms:
+   ( ali-to-post ark:$dir/cur.ali ark:- | \
+     weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
+     gmm-post-to-gpost $dir/$x.mdl "$feats" ark:- ark:- | \
+     gmm-est-et $spk2utt_opt --verbose=1 $dir/$x.mdl $dir/$x.et "$basefeats" \
+        ark,s,cs:- ark:$dir/$x.trans ark,t:$dir/$x.warp ) 2> $dir/trans.$x.log || exit 1;
+  
+     # Remove previous transforms, if present. 
+     if [ $x -gt 1 ]; then rm $dir/$[$x-1].trans; fi
+
+     # Now change $feats to correspond to the transformed features.  We compose the
+     # transforms themselves (it's more efficient than transforming the features
+     # twice).
+     feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/lda.mat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/$x.trans ark:- ark:- |"
+   fi 
+
+   # Accumulate stats to update model:
+   gmm-acc-stats-ali $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2>$dir/gmm_acc.$x.log || exit 1;
+   # Update model.
+   gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$x1.mdl \
+        2>$dir/gmm_est.$x.log || exit 1;
+
+   rm $dir/$x.acc $dir/$x.mdl
+
+
+   if [ $x -lt $numiters_et ]; then
+     # Alternately estimate either A or B.
+     if [ $[$x%2] == 0 ]; then  # Estimate A:
+     ( ali-to-post ark:$dir/cur.ali ark:- | \
+       weight-silence-post 0.0 $silphonelist $dir/$x1.mdl ark:- ark:- | \
+       gmm-post-to-gpost $dir/$x1.mdl "$feats" ark:- ark:- | \
+       gmm-et-acc-a $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$basefeats" ark,s,cs:- $dir/$x.et_acc_a ) 2> $dir/acc_a.$x.log || exit 1;
+       gmm-et-est-a --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.et_acc_a 2> $dir/update_a.$x.log || exit 1;
+       rm $dir/$x.et_acc_a
+     else
+     ( ali-to-post ark:$dir/cur.ali ark:- | \
+       weight-silence-post 0.0 $silphonelist $dir/$x1.mdl ark:- ark:- | \
+       gmm-post-to-gpost $dir/$x1.mdl "$feats" ark:- ark:- | \
+       gmm-et-acc-b $spk2utt_opt --verbose=1 $dir/$x1.mdl $dir/$x.et "$basefeats" ark,s,cs:- ark:$dir/$x.trans ark:$dir/$x.warp $dir/$x.et_acc_b ) 2> $dir/acc_b.$x.log || exit 1;
+       gmm-et-est-b --verbose=1 $dir/$x.et $dir/$x1.et $dir/$x.mat $dir/$x.et_acc_b 2> $dir/update_b.$x.log || exit 1;
+       rm $dir/$x.et_acc_b
+       # Careful!: gmm-transform-means here changes $x1.mdl in-place. 
+       gmm-transform-means $dir/$x.mat $dir/$x1.mdl $dir/$x1.mdl 2> $dir/transform_means.$x.log
+     fi   
+   fi
+   if [ $x -le $maxiterinc ]; then
+      numgauss=$[$numgauss+$incgauss];
+   fi
+   x=$[$x+1];
+done
+
+
+gmm-et-get-b $dir/$numiters_et.et $dir/B.mat 2>$dir/get_b.log || exit 1
+compose-transforms $dir/B.mat $dir/lda.mat $dir/default.mat 2>>$dir/get_b.log || exit 1
+defaultfeats="ark,s,cs:splice-feats scp:data/train.scp ark:- | transform-feats $dir/default.mat ark:- ark:- |"
+
+# Accumulate stats for "alignment model" which is as the model but with
+# the default features (shares Gaussian-level alignments).
+( ali-to-post ark:$dir/cur.ali ark:-  | \
+  gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$defaultfeats" ark:- $dir/$x.acc2 ) 2>$dir/acc_alimdl.log || exit 1;
+  # Update model.
+gmm-est --write-occs=$dir/final.occs --remove-low-count-gaussians=false $dir/$x.mdl $dir/$x.acc2 $dir/$x.alimdl \
+      2>$dir/est_alimdl.log  || exit 1;
+rm $dir/$x.acc2
+
+
+# The following files may be useful for display purposes.
+for n in 1 2 3 4 5 6 7 8 9 10 11 12 13 14; do
+ cat $dir/$n.warp | scripts/process_warps.pl data/spk2gender.map > $dir/warps.$n
+done
+
+( cd $dir; rm final.mdl 2>/dev/null; 
+  ln -s $x.mdl final.mdl; ln -s $x.alimdl final.alimdl;
+  ln -s $numiters_et.et final.et
+  ln -s $[$numiters_et-1].trans final.trans )
+ 
+
--- a/egs/rm/s1/steps/train_tri2l.sh
+++ b/egs/rm/s1/steps/train_tri2l.sh
@ -0,0 +1,156 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation  Arnab Ghoshal
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To be run from ..
+
+# This (train_tri2l) is training with splice-9-frames+LDA features,
+# plus MLLT plus CMLLR/fMLLR (i.e. speaker adapted training).
+
+if [ -f path.sh ]; then . path.sh; fi
+dir=exp/tri2l
+srcdir=exp/tri1
+srcmodel=$srcdir/final.mdl
+srcgraphs="ark:gunzip -c $srcdir/graphs.fsts.gz|"
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+numiters=30    # Number of iterations of training
+maxiterinc=20 # Last iter to increase #Gauss on.
+numleaves=1500
+numgauss=$numleaves
+totgauss=7000 # Target #Gaussians
+incgauss=$[($totgauss-$numgauss)/$maxiterinc] # per-iter increment for #Gauss
+silphonelist=`cat data/silphones.csl`
+realign_iters="10 15 20 25";  
+mllt_iters="2 4 6 8";
+fmllr_iters="9 14 19"
+spk2utt_opt="--spk2utt=ark:data/train.spk2utt"
+utt2spk_opt="--utt2spk=ark:data/train.utt2spk"
+
+# feats corresponding to original model
+srcfeats="ark:add-deltas --print-args=false scp:data/train.scp ark:- |"
+feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $dir/0.mat ark:- ark:-|"
+# Subset of features used to train LDA and MLLT transforms.
+featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $dir/0.mat ark:- ark:-|"
+
+mkdir -p $dir
+cp $srcdir/topo $dir
+
+echo "aligning all training data"
+
+gmm-align-compiled  $scale_opts --beam=8 --retry-beam=40  $srcmodel \
+   "$srcgraphs" "$srcfeats" ark,t:$dir/0.ali 2> $dir/align.0.log || exit 1;
+
+( ali-to-post ark:$dir/0.ali ark:- | \
+   weight-silence-post 0.0 $silphonelist $srcmodel ark:- ark:- | \
+   acc-lda $srcmodel "ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- |" \
+       ark:- $dir/lda.acc ) 2>$dir/lda_acc.log
+est-lda $dir/0.mat $dir/lda.acc 2>$dir/lda_est.log
+
+
+acc-tree-stats  --ci-phones=$silphonelist $srcmodel "$feats" ark:$dir/0.ali $dir/treeacc 2> $dir/acc.tree.log  || exit 1;
+
+
+cat data/phones.txt | awk '{print $NF}' | grep -v -w 0 > $dir/phones.list
+cluster-phones $dir/treeacc $dir/phones.list $dir/questions.txt 2> $dir/questions.log || exit 1;
+scripts/int2sym.pl data/phones.txt < $dir/questions.txt > $dir/questions_syms.txt
+compile-questions $dir/topo $dir/questions.txt $dir/questions.qst 2>$dir/compile_questions.log || exit 1;
+
+scripts/make_roots.pl --separate data/phones.txt `cat data/silphones.csl` shared split > $dir/roots.txt 2>$dir/roots.log || exit 1;
+
+build-tree --verbose=1 --max-leaves=$numleaves \
+    $dir/treeacc $dir/roots.txt \
+    $dir/questions.qst $dir/topo $dir/tree  2> $dir/train_tree.log || exit 1;
+
+gmm-init-model  --write-occs=$dir/1.occs  \
+    $dir/tree $dir/treeacc $dir/topo $dir/1.mdl 2> $dir/init_model.log || exit 1;
+
+rm $dir/treeacc
+
+# Convert alignments generated from monophone model, to use as initial alignments.
+
+convert-ali $srcmodel $dir/1.mdl $dir/tree ark:$dir/0.ali ark:$dir/cur.ali 2>$dir/convert.log 
+  # Debug step only: convert back and check they're the same.
+  convert-ali $dir/1.mdl $srcmodel $srcdir/tree ark:$dir/cur.ali ark,t:- \
+   2>/dev/null | cmp - $dir/0.ali || exit 1; 
+
+rm $dir/0.ali
+
+# Make training graphs
+echo "Compiling training graphs"
+compile-train-graphs $dir/tree $dir/1.mdl  data/L.fst ark:data/train.tra \
+    "ark:|gzip -c >$dir/graphs.fsts.gz"  2>$dir/compile_graphs.log || exit 1 
+
+cur_lda=$dir/0.mat
+x=1
+while [ $x -lt $numiters ]; do
+   echo pass $x
+   if echo $realign_iters | grep -w $x >/dev/null; then
+     echo "Aligning data"
+     gmm-align-compiled $scale_opts --beam=8 --retry-beam=40 $dir/$x.mdl \
+            "ark:gunzip -c $dir/graphs.fsts.gz|" "$feats" \
+             ark:$dir/cur.ali 2> $dir/align.$x.log || exit 1;
+   fi
+   if echo $fmllr_iters | grep -w $x >/dev/null; then # Compute CMLLR transforms.
+     sifeats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
+    ( ali-to-post ark:$dir/cur.ali ark:- | \
+      weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
+      gmm-post-to-gpost $dir/$x.mdl "$feats" ark:- ark:- | \
+      gmm-est-fmllr-gpost $spk2utt_opt $dir/$x.mdl "$sifeats" ark,s,cs:- ark:$dir/tmp.trans ) \
+           2> $dir/trans.$x.log  || exit 1;
+     mv $dir/tmp.trans $dir/cur.trans
+     feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/cur.trans ark:- ark:- |"
+   fi
+   if echo $mllt_iters | grep -w $x >/dev/null; then # Do MLLT update.
+    ( ali-to-post ark:$dir/cur.ali ark:- | \
+       weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- | \
+       gmm-acc-mllt --binary=false $dir/$x.mdl "$featsub" ark:- $dir/$x.macc ) 2> $dir/macc.$x.log  || exit 1;
+
+     est-mllt $dir/$x.mat.new $dir/$x.macc 2> $dir/mupdate.$x.log || exit 1;
+     gmm-transform-means --binary=false $dir/$x.mat.new $dir/$x.mdl $dir/$[$x+1].mdl 2> $dir/transform_means.$x.log || exit 1;
+     compose-transforms --print-args=false $dir/$x.mat.new $cur_lda $dir/$x.mat || exit 1;
+     cur_lda=$dir/$x.mat
+
+
+     feats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
+     # Subset of features used to train MLLT transforms.
+     featsub="ark:scripts/subset_scp.pl 800 data/train.scp | splice-feats scp:- ark:- | transform-feats $cur_lda ark:- ark:-|"
+   else # do GMM update.
+     gmm-acc-stats-ali --binary=false $dir/$x.mdl "$feats" ark:$dir/cur.ali $dir/$x.acc 2> $dir/acc.$x.log  || exit 1;
+     gmm-est --mix-up=$numgauss $dir/$x.mdl $dir/$x.acc $dir/$[$x+1].mdl 2> $dir/update.$x.log || exit 1;
+   fi
+   rm $dir/$x.mdl $dir/$x.acc 2>/dev/null
+   if [ $x -le $maxiterinc ]; then 
+      numgauss=$[$numgauss+$incgauss];
+   fi
+   x=$[$x+1]
+done
+
+defaultfeats="ark:splice-feats scp:data/train.scp ark:- | transform-feats $cur_lda ark:- ark:-|"
+
+# Accumulate stats for "alignment model" which is as the model but with
+# the unadapted, default features (shares Gaussian-level alignments).
+( ali-to-post ark:$dir/cur.ali ark:-  | \
+  gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$defaultfeats" ark:- $dir/$x.acc2 ) 2>$dir/acc_alimdl.log || exit 1;
+  # Update model.
+  gmm-est --write-occs=$dir/final.occs --remove-low-count-gaussians=false \
+     $dir/$x.mdl $dir/$x.acc2 $dir/$x.alimdl \
+     2>$dir/est_alimdl.log  || exit 1;
+rm $dir/$x.acc2
+
+( cd $dir; rm final.mdl final.alimdl 2>/dev/null; 
+  ln -s $x.mdl final.mdl; ln -s $x.alimdl final.alimdl
+  ln -s `basename $cur_lda` final.mat )
+
--- a/egs/rm/s1/steps/train_ubma.sh
+++ b/egs/rm/s1/steps/train_ubma.sh
@ -0,0 +1,48 @@
+#!/bin/bash
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# Train UBM from a trained HMM/GMM system.
+
+if [ -f path.sh ]; then . path.sh; fi
+
+dir=exp/ubma
+mkdir -p $dir
+srcdir=exp/tri1
+
+init-ubm --intermediate-numcomps=2000 --ubm-numcomps=400 --verbose=2 \
+    --fullcov-ubm=true $srcdir/final.mdl $srcdir/final.occs \
+    $dir/0.ubm 2> $dir/cluster.log
+
+
+
+subset[0]=1000
+subset[1]=1500
+subset[2]=2000
+subset[3]=2500
+
+for x in 0 1 2 3; do
+    echo "Pass $x"
+    feats="ark:scripts/subset_scp.pl ${subset[$x]} data/train.scp | add-deltas --print-args=false scp:- ark:- |"
+    fgmm-acc-stats --diag-gmm-nbest=15 --binary=false --verbose=2 $dir/$x.ubm "$feats" $dir/$x.acc \
+	2> $dir/acc.$x.log  || exit 1;
+    fgmm-est --verbose=2 $dir/$x.ubm $dir/$x.acc \
+	$dir/$[$x+1].ubm 2> $dir/update.$x.log || exit 1;
+    rm $dir/$x.acc $dir/$x.ubm
+done
+
+
+
--- a/egs/wsj/README.txt
+++ b/egs/wsj/README.txt
@ -0,0 +1,8 @@
+
+Each subdirectory of this directory contains the
+scripts for a sequence of experiments.
+
+  s1: This setup is experiments with GMM-based systems with various 
+      Maximum Likelihood 
+      techniques including global and speaker-specific transforms.
+      See a parallel setup in ../rm/s1 
--- a/egs/wsj/s1/conf/mfcc.conf
+++ b/egs/wsj/s1/conf/mfcc.conf
@ -0,0 +1 @@
+--use-energy=false   # only non-default option.
--- a/egs/wsj/s1/conf/topo.proto
+++ b/egs/wsj/s1/conf/topo.proto
@ -0,0 +1,22 @@
+<Topology> 
+<TopologyEntry> 
+<ForPhones>
+NONSILENCEPHONES
+</ForPhones> 
+<State> 0 <PdfClass> 0 <Transition> 0 0.75 <Transition> 1 0.25 </State> 
+<State> 1 <PdfClass> 1 <Transition> 1 0.75 <Transition> 2 0.25 </State> 
+<State> 2 <PdfClass> 2 <Transition> 2 0.75 <Transition> 3 0.25 </State> 
+<State> 3 </State>
+</TopologyEntry> 
+<TopologyEntry> 
+<ForPhones>
+SILENCEPHONES
+</ForPhones> 
+<State> 0 <PdfClass> 0 <Transition> 0 0.25 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 </State> 
+<State> 1 <PdfClass> 1 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State> 
+<State> 2 <PdfClass> 2 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State> 
+<State> 3 <PdfClass> 3 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State> 
+<State> 4 <PdfClass> 4 <Transition> 4 0.25 <Transition> 5 0.75 </State> 
+<State> 5 </State>
+</TopologyEntry> 
+</Topology> 
--- a/egs/wsj/s1/data_prep/find_transcripts.pl
+++ b/egs/wsj/s1/data_prep/find_transcripts.pl
@ -0,0 +1,64 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+
+# This program takes on its standard input a list of utterance
+# id's, one for each line. (e.g. 4k0c030a is a an utterance id).
+# It takes as
+# Extracts from the dot files the transcripts for a given
+# dataset (represented by a file list).
+# 
+
+@ARGV == 1 || die "find_transcripts.pl dot_files_flist < utterance_ids > transcripts";
+$dot_flist = shift @ARGV;
+
+open(L, "<$dot_flist") || die "Opening file list of dot files: $dot_flist\n";
+while(<L>){
+    chop;
+    m:\S+/(\w{6})00.dot: || die "Bad line in dot file list: $_";
+    $spk = $1;
+    $spk2dot{$spk} = $_;
+}
+
+
+
+while(<STDIN>){ 
+    chop;
+    $uttid = $_;
+    $uttid =~ m:(\w{6})\w\w: || die "Bad utterance id $_";
+    $spk = $1;
+    if($spk ne $curspk) {
+        %utt2trans = { }; # Don't keep all the transcripts in memory...
+        $curspk = $spk;
+        $dotfile = $spk2dot{$spk};
+        defined $dotfile || die "No dot file for speaker $spk\n";
+        open(F, "<$dotfile") || die "Error opening dot file $dotfile\n";
+        while(<F>) {
+            $_ =~ m:(.+)\((\w{8})\)\s*$: || die "Bad line $_ in dot file $dotfile (line $.)\n";
+            $trans = $1;
+            $utt = $2;
+            $utt2trans{$utt} = $trans;
+        }
+    }
+    if(!defined $utt2trans{$uttid}) {
+        print STDERR "No transcript for utterance $uttid (current dot file is $dotfile)\n";
+    } else {
+        print "$uttid $utt2trans{$uttid}\n";
+    }
+}
+
+
--- a/egs/wsj/s1/data_prep/flist2scp.pl
+++ b/egs/wsj/s1/data_prep/flist2scp.pl
@ -0,0 +1,31 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# takes in a file list with lines like
+# /mnt/matylda2/data/WSJ1/13-16.1/wsj1/si_dt_20/4k0/4k0c030a.wv1
+# and outputs an scp in kaldi format with lines like
+# 4k0c030a /mnt/matylda2/data/WSJ1/13-16.1/wsj1/si_dt_20/4k0/4k0c030a.wv1
+# (the first thing is the utterance-id, which is the same as the basename of the file.
+
+
+while(<>){
+    m:^\S+/(\w+)\.[wW][vV]1$: || die "Bad line $_";
+    $id = $1;
+    $id =~ tr/A-Z/a-z/;  # Necessary because of weirdness on disk 13-16.1 (uppercase filenames)
+    print "$id $_";
+}
+
--- a/egs/wsj/s1/data_prep/ndx2flist.pl
+++ b/egs/wsj/s1/data_prep/ndx2flist.pl
@ -0,0 +1,62 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# This program takes as its standard input an .ndx file from the WSJ corpus that looks
+# like this:
+#;; File: tr_s_wv1.ndx, updated 04/26/94
+#;;
+#;; Index for WSJ0 SI-short Sennheiser training data
+#;; Data is read WSJ sentences, Sennheiser mic.
+#;; Contains 84 speakers X (~100 utts per speaker MIT/SRI and ~50 utts 
+#;; per speaker TI) = 7236 utts
+#;;
+#11_1_1:wsj0/si_tr_s/01i/01ic0201.wv1
+#11_1_1:wsj0/si_tr_s/01i/01ic0202.wv1
+#11_1_1:wsj0/si_tr_s/01i/01ic0203.wv1
+
+#and as command-line arguments it takes the names of the WSJ disk locations, e.g.:
+#/mnt/matylda2/data/WSJ0/11-1.1 /mnt/matylda2/data/WSJ0/11-10.1  ... etc.
+# It outputs a list of absolute pathnames (it does this by replacing e.g. 11_1_1 with
+# /mnt/matylda2/data/WSJ0/11-1.1.
+# It also does a slight fix because one of the WSJ disks (WSJ1/13-16.1) was distributed with
+# uppercase rather than lower case filenames.
+
+foreach $fn (@ARGV) {
+    $fn =~ m:.+/([0-9\.\-]+)/?$: || die "Bad command-line argument $fn\n";
+    $disk_id=$1; 
+    $disk_id =~ tr/-\./__/; # replace - and . with - so 11-10.1 becomes 11_10_1
+    $fn =~ s:/$::; # Remove final slash, just in case it is present.
+    $disk2fn{$disk_id} = $fn;
+}
+
+while(<STDIN>){
+    if(m/^;/){ next; } # Comment.  Ignore it.
+    else {
+      m/^([0-9_]+):\s*(\S+)$/  || die "Could not parse line $_";
+      $disk=$1;
+      if(!defined $disk2fn{$disk}) {
+          die "Disk id $disk not found";
+      }
+      $filename = $2; # as a subdirectory of the distributed disk.
+      if($disk eq "13_16_1" && `hostname` =~ m/fit.vutbr.cz/) {
+          # The disk 13-16.1 has been uppercased for some reason, on the
+          # BUT system.  This is a fix specifically for that case.
+          $filename =~ tr/a-z/A-Z/; # This disk contains all uppercase filenames.  Why?
+      }
+      print "$disk2fn{$disk}/$filename\n";
+  }
+}
--- a/egs/wsj/s1/data_prep/normalize_transcript.pl
+++ b/egs/wsj/s1/data_prep/normalize_transcript.pl
@ -0,0 +1,57 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# This takes data from the standard input that's unnormalized transcripts in the format
+# 4k2c0308 Of course there isn\'t any guarantee the company will keep its hot hand [misc_noise] 
+# 4k2c030a [loud_breath] And new hardware such as the set of personal computers I\. B\. M\. introduced last week can lead to unexpected changes in the software business [door_slam] 
+# and outputs normalized transcripts.
+# c.f. /mnt/matylda2/data/WSJ0/11-10.1/wsj0/transcrp/doc/dot_spec.doc
+
+@ARGV == 1 ||  die "usage: normalize_transcript.pl noise_word < transcript > transcript2";
+$noise_word = shift @ARGV;
+
+while(<STDIN>) {
+    $_ =~ m:^(\S+) (.+): || die "bad line $_";
+    $utt = $1;
+    $trans = $2;
+    print "$utt";
+    foreach $w (split (" ",$trans)) {
+        $w =~ tr:a-z:A-Z:; # Upcase everything to match the CMU dictionary. .
+        $w =~ s:\\::g;      # Remove backslashes.  We don't need the quoting.
+        if($w =~ m:^\[\<\w+\]$:  || # E.g. [<door_slam], this means a door slammed in the preceding word. Delete.
+           $w =~ m:^\[\w+\>\]$:  ||  # E.g. [door_slam>], this means a door slammed in the next word.  Delete.
+           $w =~ m:\[\w+/\]$: ||  # E.g. [phone_ring/], which indicates the start of this phenomenon.
+           $w =~ m:\[\/\w+]$: ||  # E.g. [/phone_ring], which indicates the end of this phenomenon.
+           $w eq "~" ||        # This is used to indicate truncation of an utterance.  Not a word.
+           $w eq ".") {      # "." is used to indicate a pause.  Silence is optional anyway so not much 
+                             # point including this in the transcript.
+            next; # we won't print this word.
+        } elsif($w =~ m:\[\w+\]:) { # Other noises, e.g. [loud_breath].
+            print " $noise_word";
+        } elsif($w =~ m:^\<([\w\']+)\>$:) {
+            # e.g. replace <and> with and.  (the <> means verbal deletion of a word).. but it's pronounced.
+            print " $1";
+        } elsif($w eq "--DASH") {
+            print " -DASH";  # This is a common issue; the CMU dictionary has it as -DASH.
+#        } elsif($w =~ m:(.+)\-DASH$:) { # E.g. INCORPORATED-DASH... seems the DASH gets combined with previous word
+#            print " $1 -DASH";
+        } else {
+            print " $w";
+        }
+    }
+    print "\n";
+}
--- a/egs/wsj/s1/data_prep/oov2unk.pl
+++ b/egs/wsj/s1/data_prep/oov2unk.pl
@ -0,0 +1,54 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# takes a transcript file with lines like
+# 40po031e THE RATE FELL TO SIX %PERCENT IN NOVEMBER NINETEEN EIGHTY SIX .PERIOD
+# on the standard input.
+# The first (and only) command-line argument is the filename of a dictionary file with lines like
+# ZYUGANOV  Z Y UW1 G AA0 N AA0 V
+# This file replaces all OOVs with the spoken-noise word and prints counts for each OOV on the standard error.
+
+@ARGV == 2 || die "Usage: oov2unk.pl dict spoken-noise-word < transcript > transcript2";
+
+$dict = shift @ARGV;
+open(F, "<$dict") || die "Died opening dictionary file $dict\n";
+while(<F>){
+   @A = split(" ", $_);
+   $word = shift @A;
+   $seen{$word} = 1;
+}
+$spoken_noise_word = shift @ARGV;
+
+while(<STDIN>) {
+   @A = split(" ", $_);
+   $utt = shift @A;
+   print $utt;
+   foreach $a (@A) {
+       if(defined $seen{$a}) {
+           print " $a";
+       } else  { 
+           $oov{$a}++;
+           print " $spoken_noise_word";
+       }
+   }
+   print "\n";
+}
+
+
+foreach $w (sort { $oov{$a} <=> $oov{$b} } keys %oov) {
+    print STDERR "$w $oov{$w}\n";
+}
--- a/egs/wsj/s1/data_prep/run.sh
+++ b/egs/wsj/s1/data_prep/run.sh
@ -0,0 +1,157 @@
+# This script should be run from its own directory (.)
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# It takes as arguments a list of directories that should end
+# with numbers like 13-4.1.  These are the subdirectories in the WSJ disks.
+# on the BUT system we can get these by doing:
+#  ./run.sh /mnt/matylda2/data/WSJ?/??-{?,??}.?
+
+# Another example is:
+#  ./run.sh  /ais/gobi2/speech/WSJ/*/??-{?,??}.?
+
+
+if [ $# -lt 4 ]; then
+   echo "Too few arguments to run.sh: need a list of WSJ directories ending e.g. 11-13.1"
+   exit 1;
+fi
+
+rm -r links/ 2>/dev/null
+mkdir links/
+ln -s $* links
+
+# This version for SI-84
+
+cat links/11-13.1/wsj0/doc/indices/train/tr_s_wv1.ndx | \
+ ./ndx2flist.pl $* | sort | \
+ grep -v 11-2.1/wsj0/si_tr_s/401 > train_si84.flist
+
+# This version for SI-284
+cat links/13-34.1/wsj1/doc/indices/si_tr_s.ndx \
+ links/11-13.1/wsj0/doc/indices/train/tr_s_wv1.ndx | \
+ ./ndx2flist.pl  $* | sort | \
+ grep -v 11-2.1/wsj0/si_tr_s/401 > train_si284.flist
+
+
+# Now for the test sets.
+# links/13-34.1/wsj1/doc/indices/readme.doc 
+# describes all the different test sets.
+# Note: each test-set seems to come in multiple versions depending
+# on different vocabulary sizes, verbalized vs. non-verbalized
+# pronunciations, etc.  We use the largest vocab and non-verbalized
+# pronunciations.
+# The most normal one seems to be the "baseline 60k test set", which
+# is h1_p0. 
+
+# Nov'92 (333 utts)
+# These index files have a slightly different  format;
+# have to add .wv1
+cat links/11-13.1/wsj0/doc/indices/test/nvp/si_et_20.ndx | \
+  ./ndx2flist.pl $* |  awk '{printf("%s.wv1\n", $1)}' | \
+  sort > eval_nov92.flist
+
+# Nov'93: (213 utts)
+# Have to replace a wrong disk-id.
+cat links/13-32.1/wsj1/doc/indices/wsj1/eval/h1_p0.ndx | \
+  sed s/13_32_1/13_33_1/ | \
+  ./ndx2flist.pl $* | sort > eval_nov93.flist
+
+# Dev-set for Nov'93 (503  utts)
+cat links/13-34.1/wsj1/doc/indices/h1_p0.ndx | \
+  ./ndx2flist.pl $* | sort > dev_nov93.flist
+
+# Dev-set for Nov'93 (503 utts)
+# links/13-34.1/wsj1/doc/indices/h1_p0.ndx
+
+# Finding the transcript files:
+for x in $*; do find $x -iname '*.dot'; done > dot_files.flist
+
+# Convert the transcripts into our format (no normalization yet)
+for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
+   ./flist2scp.pl $x.flist | sort > ${x}_sph.scp
+   cat ${x}_sph.scp | awk '{print $1}' | ./find_transcripts.pl  dot_files.flist > $x.trans1
+done
+
+# Do some initial normalization steps.
+noiseword="<NOISE>";
+for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
+   cat $x.trans1 | ./normalize_transcript.pl $noiseword > $x.trans2 || exit 1
+done
+
+if [ ! -f ../data/lexicon.txt ]; then
+   echo  "You need to get ../data/lexicon.txt first (see ../run.sh)"
+   exit 1
+fi
+# Convert OOVs to <SPOKEN_NOISE>
+spoken_noise_word="<SPOKEN_NOISE>";
+for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
+   cat $x.trans2 | ./oov2unk.pl ../data/lexicon.txt $spoken_noise_word | sort  > $x.txt  || exit 1 # the .txt is the final transcript.
+done
+ 
+# Create scp's with wav's. (the wv1 in the distribution is not really wav, it is sph.)
+sph2pipe=`cd ../../../..; echo $PWD/tools/sph2pipe_v2.5/sph2pipe`
+if [ ! -f $sph2pipe ]; then
+   echo "Could not find the sph2pipe program at $sph2pipe";
+   exit 1;
+fi
+for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
+  awk '{printf("%s '$sph2pipe' -f wav %s |\n", $1, $2);}' < ${x}_sph.scp > ${x}_wav.scp
+done
+
+
+# The 20K vocab, open-vocabulary language model (i.e. the one with UNK), without
+# verbalized pronunciations.   This is the most common test setup, I understand.
+
+cp links/13-32.1/wsj1/doc/lng_modl/base_lm/bcb20onp.z  lm_bg.arpa.gz
+chmod u+w lm_bg.arpa.gz
+# trigram would be:
+
+cat links/13-32.1/wsj1/doc/lng_modl/base_lm/tcb20onp.z | \
+ perl -e 'while(<>){ if(m/^\\data\\/){ print; last;  } } while(<>){ print; }' | \
+ gzip -c -f > lm_tg.arpa.gz
+
+export PATH=$PATH:../../../../tools/irstlm/bin
+prune-lm --threshold=1e-7 lm_tg.arpa.gz lm_tg_pruned.arpa
+gzip -f lm_tg_pruned.arpa
+
+# Make the utt2spk and spk2utt files.
+for x in train_si84 train_si284 eval_nov92 eval_nov93 dev_nov93; do
+   cat ${x}_sph.scp | awk '{print $1}' | perl -ane 'chop; m:^...:; print "$_ $&\n";' > $x.utt2spk
+   cat $x.utt2spk | ../scripts/utt2spk_to_spk2utt.pl > $x.spk2utt
+done
+
+
+if [ ! -f wsj0-train-spkrinfo.txt ]; then
+  wget http://www.ldc.upenn.edu/Catalog/docs/LDC93S6A/wsj0-train-spkrinfo.txt
+fi
+
+if [ ! -f wsj0-train-spkrinfo.txt ]; then
+  echo "Could not get the spkrinfo.txt file from LDC website (moved)?"
+  echo "This is possibly omitted from the training disks; couldn't find it." 
+  echo "Everything else may have worked; we just may be missing gender info"
+  echo "which is only needed for VTLN-related diagnostics anyway."
+  exit 1
+fi
+# Note: wsj0-train-spkrinfo.txt doesn't seem to be on the disks but the
+# LDC put it on the web.  Perhaps it was accidentally omitted from the
+# disks.  I put it in the repository.
+
+cat links/11-13.1/wsj0/doc/spkrinfo.txt \
+    links/13-34.1/wsj1/doc/train/spkrinfo.txt \
+   ./wsj0-train-spkrinfo.txt  | \
+   perl -ane 'tr/A-Z/a-z/;print;' | grep -v ';' | \
+   awk '{print $1, $2}' > spk2gender.map
+
--- a/egs/wsj/s1/path.sh
+++ b/egs/wsj/s1/path.sh
@ -0,0 +1,2 @@
+export PATH=$PATH:../../../src/bin:../../../tools/openfst/bin:../../../src/fstbin/:../../../src/gmmbin/:../../../src/featbin/:../../../src/lm/
+export LC_ALL=C
--- a/egs/wsj/s1/run.sh
+++ b/egs/wsj/s1/run.sh
@ -0,0 +1,304 @@
+#!/bin/bash
+
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+exit 1;
+# This is a shell script, but it's recommended that you run the commands one by
+# one by copying and pasting into the shell.
+# Caution: some of the graph creation steps use quite a bit of memory, so you
+# might want to run this script on a machine that has plenty of memory.
+
+# (1) To get the CMU dictionary, do:
+svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/
+# got this at revision 10742 in my current test.  can add -r 10742 for strict
+# compatibility.
+
+#(2) Dictionary preparation:
+
+mkdir -p data
+
+# Make phones symbol-table (adding in silence and verbal and non-verbal noises at this point).
+# We are adding suffixes _B, _E, _S for beginning, ending, and singleton phones.
+
+cat cmudict/cmudict.0.7a.symbols | perl -ane 's:\r::; print;' | \
+ awk 'BEGIN{print "<eps> 0"; print "SIL 1"; print "SPN 2"; print "NSN 3"; N=4; } 
+           {printf("%s %d\n", $1, N++); }
+           {printf("%s_B %d\n", $1, N++); }
+           {printf("%s_E %d\n", $1, N++); }
+           {printf("%s_S %d\n", $1, N++); } ' >data/phones.txt
+
+
+# First make a version of the lexicon without the silences etc, but with the position-markers.
+# Remove the comments from the cmu lexicon and remove the (1), (2) from words with multiple 
+# pronunciations.
+
+grep -v ';;;' cmudict/cmudict.0.7a | perl -ane 'if(!m:^;;;:){ s:(\S+)\(\d+\) :$1 :; print; }' \
+ | perl -ane '@A=split(" ",$_); $w = shift @A; @A>0||die;
+   if(@A==1) { print "$w $A[0]_S\n"; } else { print "$w $A[0]_B ";
+     for($n=1;$n<@A-1;$n++) { print "$A[$n] "; } print "$A[$n]_E\n"; } ' \
+  > data/lexicon_nosil.txt
+
+# Add to cmudict the silences, noises etc.
+
+(echo '!SIL SIL'; echo '<s> '; echo '</s> '; echo '<SPOKEN_NOISE> SPN'; echo '<UNK> SPN'; echo '<NOISE> NSN'; ) | \
+ cat - data/lexicon_nosil.txt  > data/lexicon.txt
+
+
+silphones="SIL SPN NSN";
+# Generate colon-separated lists of silence and non-silence phones.
+scripts/silphones.pl data/phones.txt "$silphones" data/silphones.csl data/nonsilphones.csl
+
+# This adds disambig symbols to the lexicon and produces data/lexicon_disambig.txt
+
+ndisambig=`scripts/add_lex_disambig.pl data/lexicon.txt data/lexicon_disambig.txt`
+echo $ndisambig > data/lex_ndisambig
+# Next, create a phones.txt file that includes the disambig symbols.
+# the --include-zero includes the #0 symbol we pass through from the grammar.
+scripts/add_disambig.pl --include-zero data/phones.txt $ndisambig > data/phones_disambig.txt
+
+# Make the words symbol-table; add the disambiguation symbol #0 (we use this in place of epsilon
+# in the grammar FST).
+cat data/lexicon.txt | awk '{print $1}' | sort | uniq  | \
+ awk 'BEGIN{print "<eps> 0";} {printf("%s %d\n", $1, NR);} END{printf("#0 %d\n", NR+1);} ' \
+  > data/words.txt
+
+
+#(3)
+# data preparation (this step requires the WSJ disks, from LDC).
+# It takes as arguments a list of the directories ending in
+# e.g. 11-13.1 (we don't assume a single root dir because
+# there are different ways of unpacking them).
+
+cd data_prep
+
+#TODO: remove following system-specific comments.
+#On BUT system, do:
+./run.sh /mnt/matylda2/data/WSJ?/??-{?,??}.?
+
+# On Geoff Hinton's system we can do:
+#  ./run.sh  /ais/gobi2/speech/WSJ/*/??-{?,??}.?
+
+
+cd ..
+
+
+
+# Here is where we select what data to train on.
+# use all the si284 data.
+cp data_prep/train_si284_wav.scp data/train_wav.scp
+cp data_prep/train_si284.txt data/train.txt
+cp data_prep/train_si284.spk2utt data/train.spk2utt 
+cp data_prep/train_si284.utt2spk data/train.utt2spk
+cp data_prep/spk2gender.map data/
+
+for x in eval_nov92 dev_nov93 eval_nov93; do 
+  cp data_prep/$x.spk2utt data/$x.spk2utt
+  cp data_prep/$x.utt2spk data/$x.utt2spk
+  cp data_prep/$x.txt data/$x.txt
+done
+
+for x in train eval_nov92 dev_nov93 eval_nov93; do  
+ cat data/$x.txt | scripts/sym2int.pl --ignore-first-field data/words.txt  > data/$x.tra
+done
+
+
+# Get the right paths on our system by sourcing the following shell file
+# (edit it if it's not right for your setup). 
+. path.sh
+
+# Create the basic L.fst without disambiguation symbols, for use
+# in training. 
+scripts/make_lexicon_fst.pl data/lexicon.txt 0.5 SIL | \
+  fstcompile --isymbols=data/phones.txt --osymbols=data/words.txt \
+  --keep_isymbols=false --keep_osymbols=false | \
+   fstarcsort --sort_type=olabel > data/L.fst
+
+# Create the lexicon FST with disambiguation symbols.  There is an extra
+# step where we create a loop "pass through" the disambiguation symbols
+# from G.fst.  
+
+phone_disambig_symbol=`grep \#0 data/phones_disambig.txt | awk '{print $2}'`
+word_disambig_symbol=`grep \#0 data/words.txt | awk '{print $2}'`
+
+scripts/make_lexicon_fst.pl data/lexicon_disambig.txt 0.5 SIL  | \
+   fstcompile --isymbols=data/phones_disambig.txt --osymbols=data/words.txt \
+   --keep_isymbols=false --keep_osymbols=false |   \
+   fstaddselfloops  "echo $phone_disambig_symbol |" "echo $word_disambig_symbol |" | \
+   fstarcsort --sort_type=olabel > data/L_disambig.fst
+
+
+# Making the grammar FSTs 
+# This step is quite specific to this WSJ setup.
+# see data_prep/run.sh for more about where these LMs came from.
+
+steps/make_lm_fsts.sh
+
+## Sanity check; just making sure the next command does not crash. 
+fstdeterminizestar data/G_bg.fst >/dev/null  
+
+## Sanity check; just making sure the next command does not crash. 
+fsttablecompose data/L_disambig.fst data/G_bg.fst | fstdeterminizestar >/dev/null
+
+
+# At this point, make sure that "./exp/" is somewhere you can write
+# a reasonably large amount of data (i.e. on a fast and large 
+# disk somewhere).  It can be a soft link if necessary.
+
+
+# (4) feature generation
+
+
+# Make the training features.
+# note that this runs 3-4 times faster if you compile with DEBUGLEVEL=0
+# (this turns on optimization).
+
+# Set "dir" to someplace you can write to.
+dir=/mnt/matylda6/jhu09/qpovey/kaldi_wsj2_mfcc_e
+steps/make_mfcc_train.sh $dir
+steps/make_mfcc_test.sh $dir
+
+
+# (5) running the training and testing steps..
+
+steps/train_mono.sh || exit 1;
+
+(scripts/mkgraph.sh --mono data/G_tg_pruned.fst exp/mono/tree exp/mono/final.mdl exp/graph_mono_tg_pruned || exit 1;
+ scripts/decode.sh exp/decode_mono_tgpr_eval92 exp/graph_mono_tg_pruned/HCLG.fst steps/decode_mono.sh data/eval_nov92.scp ) &
+
+steps/train_tri1.sh || exit 1;
+
+# add --no-queue --num-jobs 4 after "scripts/decode.sh" below, if you don't have
+# qsub on your system.  The number of jobs to use depends on how many CPUs and
+# how much memory you have, on the local machine.  If you do have qsub on your
+# system, you will probably have to edit steps/decode.sh anyway to change the
+# queue options... or if you have a different queueing system, you'd have to
+# modify the script to use that.
+ 
+(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri1/tree exp/tri1/final.mdl exp/graph_tri1_tg_pruned || exit 1;
+ scripts/decode.sh exp/decode_tri1_tgpr_eval92 exp/graph_tri1_tg_pruned/HCLG.fst steps/decode_tri1.sh data/eval_nov92.scp ) &
+
+steps/train_tri2a.sh || exit 1;
+
+(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2a/tree exp/tri2a/final.mdl exp/graph_tri2a_tg_pruned || exit 1;
+ scripts/decode.sh exp/decode_tri2a_tgpr_eval92 exp/graph_tri2a_tg_pruned/HCLG.fst steps/decode_tri2a.sh data/eval_nov92.scp 
+ scripts/decode.sh exp/decode_tri2a_tgpr_eval93 exp/graph_tri2a_tg_pruned/HCLG.fst steps/decode_tri2a.sh data/eval_nov93.scp 
+
+ scripts/decode.sh exp/decode_tri2a_tgpr_fmllr_utt_eval92 exp/graph_tri2a_tg_pruned/HCLG.fst steps/decode_tri2a_fmllr.sh data/eval_nov92.scp 
+ scripts/decode.sh --per-spk exp/decode_tri2a_tgpr_fmllr_eval92 exp/graph_tri2a_tg_pruned/HCLG.fst steps/decode_tri2a_fmllr.sh data/eval_nov92.scp 
+
+
+) &
+
+
+steps/train_tri3a.sh || exit 1;
+
+(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri3a/tree exp/tri3a/final.mdl exp/graph_tri3a_tg_pruned || exit 1;
+ scripts/decode.sh exp/decode_tri3a_tgpr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a.sh data/eval_nov92.scp 
+# per-speaker fMLLR
+scripts/decode.sh --per-spk exp/decode_tri3a_tgpr_fmllr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a_fmllr.sh data/eval_nov92.scp
+# per-utterance fMLLR
+scripts/decode.sh exp/decode_tri3a_tgpr_uttfmllr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a_fmllr.sh data/eval_nov92.scp 
+# per-speaker diagonal fMLLR
+scripts/decode.sh --per-spk exp/decode_tri3a_tgpr_dfmllr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a_diag_fmllr.sh data/eval_nov92.scp 
+# per-utterance diagonal fMLLR
+scripts/decode.sh exp/decode_tri3a_tgpr_uttdfmllr_eval92 exp/graph_tri3a_tg_pruned/HCLG.fst steps/decode_tri3a_diag_fmllr.sh data/eval_nov92.scp 
+)&
+
+# will delete:
+## scripts/decode_queue_fmllr.sh exp/graph_tri3a_tg_pruned exp/tri3a/final.mdl exp/decode_tri3a_tg_pruned_fmllr &
+
+#### Now alternative experiments... ###
+
+# ET
+steps/train_tri2b.sh
+(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2b/tree exp/tri2b/final.mdl exp/graph_tri2b_tg_pruned || exit 1;
+ scripts/decode.sh exp/decode_tri2b_tgpr_utt_eval92 exp/graph_tri2b_tg_pruned/HCLG.fst steps/decode_tri2b.sh data/eval_nov92.scp 
+ scripts/decode.sh --per-spk exp/decode_tri2b_tgpr_eval92 exp/graph_tri2b_tg_pruned/HCLG.fst steps/decode_tri2b.sh data/eval_nov92.scp 
+ scripts/decode.sh exp/decode_tri2b_tgpr_utt_fmllr_eval92 exp/graph_tri2b_tg_pruned/HCLG.fst steps/decode_tri2b_fmllr.sh data/eval_nov92.scp 
+ scripts/decode.sh --per-spk exp/decode_tri2b_tgpr_fmllr_eval92 exp/graph_tri2b_tg_pruned/HCLG.fst steps/decode_tri2b_fmllr.sh data/eval_nov92.scp 
+) &
+
+# MLLT/STC
+steps/train_tri2d.sh
+(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2d/tree exp/tri2d/final.mdl exp/graph_tri2d_tg_pruned || exit 1;
+ scripts/decode.sh exp/decode_tri2d_tgpr_eval92 exp/graph_tri2d_tg_pruned/HCLG.fst steps/decode_tri2d.sh data/eval_nov92.scp  )&
+
+# Splice+LDA
+steps/train_tri2e.sh
+(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2e/tree exp/tri2e/final.mdl exp/graph_tri2e_tg_pruned || exit 1;
+ scripts/decode.sh exp/decode_tri2e_tgpr_eval92 exp/graph_tri2e_tg_pruned/HCLG.fst steps/decode_tri2e.sh data/eval_nov92.scp  )&
+
+# Splice+LDA+MLLT
+steps/train_tri2f.sh
+(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2f/tree exp/tri2f/final.mdl exp/graph_tri2f_tg_pruned || exit 1;
+ scripts/decode.sh exp/decode_tri2f_tgpr_eval92 exp/graph_tri2f_tg_pruned/HCLG.fst steps/decode_tri2f.sh data/eval_nov92.scp  )&
+
+# Linear VTLN (+ regular VTLN)
+steps/train_tri2g.sh
+(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2g/tree exp/tri2g/final.mdl exp/graph_tri2g_tg_pruned || exit 1;
+ scripts/decode.sh exp/decode_tri2g_tgpr_utt_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g.sh data/eval_nov92.scp  
+ scripts/decode.sh exp/decode_tri2g_tgpr_utt_diag_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g_diag.sh data/eval_nov92.scp  
+ scripts/decode.sh --wav exp/decode_tri2g_tgpr_utt_vtln_diag_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g_vtln_diag.sh data/eval_nov92.scp  
+
+ scripts/decode.sh --per-spk exp/decode_tri2g_tgpr_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g.sh data/eval_nov92.scp  
+ scripts/decode.sh --per-spk exp/decode_tri2g_tgpr_diag_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g_diag.sh data/eval_nov92.scp  
+ scripts/decode.sh --wav --per-spk exp/decode_tri2g_tgpr_vtln_diag_eval92 exp/graph_tri2g_tg_pruned/HCLG.fst steps/decode_tri2g_vtln_diag.sh data/eval_nov92.scp  
+
+)&
+
+# Splice+HLDA
+steps/train_tri2h.sh
+(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2h/tree exp/tri2h/final.mdl exp/graph_tri2h_tg_pruned || exit 1;
+ scripts/decode.sh exp/decode_tri2h_tgpr_eval92 exp/graph_tri2h_tg_pruned/HCLG.fst steps/decode_tri2h.sh data/eval_nov92.scp  )&
+
+# Triple-deltas + HLDA
+steps/train_tri2i.sh
+(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2i/tree exp/tri2i/final.mdl exp/graph_tri2i_tg_pruned || exit 1;
+ scripts/decode.sh exp/decode_tri2i_tgpr_eval92 exp/graph_tri2i_tg_pruned/HCLG.fst steps/decode_tri2i.sh data/eval_nov92.scp  )&
+
+# Splice + HLDA
+steps/train_tri2j.sh
+(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2j/tree exp/tri2j/final.mdl exp/graph_tri2j_tg_pruned || exit 1;
+ scripts/decode.sh exp/decode_tri2j_tgpr_eval92 exp/graph_tri2j_tg_pruned/HCLG.fst steps/decode_tri2j.sh data/eval_nov92.scp  )&
+
+
+# LDA+ET
+steps/train_tri2k.sh
+(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2k/tree exp/tri2k/final.mdl exp/graph_tri2k_tg_pruned || exit 1;
+ scripts/decode.sh exp/decode_tri2k_tgpr_utt_eval92 exp/graph_tri2k_tg_pruned/HCLG.fst steps/decode_tri2k.sh data/eval_nov92.scp 
+ scripts/decode.sh --per-spk exp/decode_tri2k_tgpr_eval92 exp/graph_tri2k_tg_pruned/HCLG.fst steps/decode_tri2k.sh data/eval_nov92.scp 
+ )&
+
+# LDA+MLLT+SAT
+steps/train_tri2l.sh
+(scripts/mkgraph.sh data/G_tg_pruned.fst exp/tri2l/tree exp/tri2l/final.mdl exp/graph_tri2l_tg_pruned || exit 1;
+ scripts/decode.sh exp/decode_tri2l_tgpr_utt_eval92 exp/graph_tri2l_tg_pruned/HCLG.fst steps/decode_tri2l.sh data/eval_nov92.scp 
+ scripts/decode.sh --per-spk exp/decode_tri2l_tgpr_eval92 exp/graph_tri2l_tg_pruned/HCLG.fst steps/decode_tri2l.sh data/eval_nov92.scp 
+ )&
+
+
+
+
+
+# Note on WERs at different stages of decoding:
+#exp/decode_mono_tg_pruned/wer:%WER 31.82 [ 1795 / 5641, 109 ins, 412 del, 1274 sub ]
+#exp/decode_tri1_tg_pruned/wer:%WER 13.61 [ 768 / 5641, 134 ins, 76 del, 558 sub ]
+#exp/decode_tri2a_tg_pruned/wer:%WER 12.94 [ 730 / 5641, 131 ins, 62 del, 537 sub ]
+#exp/decode_tri3a_tg_pruned/wer:%WER 10.88 [ 614 / 5641, 126 ins, 47 del, 441 sub ]
+
+
+# For an e.g. of scoring with sclite: do e.g.
+#  scripts/score_sclite.sh exp/decode_tri2a_tg_pruned 
--- a/egs/wsj/s1/scripts/add_disambig.pl
+++ b/egs/wsj/s1/scripts/add_disambig.pl
@ -0,0 +1,58 @@
+#!/usr/bin/perl
+# Copyright 2010-2011 Microsoft Corporation
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# Adds some specified number of disambig symbols to a symbol table.
+# Adds these as #1, #2, etc.
+# If the --include-zero option is specified, includes an extra one
+# #0.
+
+$include_zero = 0;
+if($ARGV[0] eq "--include-zero") {
+    $include_zero = 1;
+    shift @ARGV;
+}
+
+if(@ARGV != 2) {
+    die "Usage: add_disambig.pl [--include-zero] symtab.txt num_extra > symtab_out.txt ";
+}
+
+
+$input = $ARGV[0];
+$nsyms = $ARGV[1];
+
+open(F, "<$input") || die "Opening file $input";
+
+while(<F>) {
+    @A = split(" ", $_);
+    @A == 2 || die "Bad line $_";
+    $lastsym = $A[1];
+    print;
+}
+
+if(!defined($lastsym)){
+ die "Empty symbol file?";
+}
+
+if($include_zero) {
+    $lastsym++;
+    print "#0  $lastsym\n";
+}
+
+for($n = 1; $n <= $nsyms; $n++) {
+    $y = $n + $lastsym;
+    print "#$n  $y\n";
+}
--- a/Показать больше
+++ b/Показать больше
				`@ -0,0 +1 @@`
				`--use-energy=false # only non-default option.`
				`@ -0,0 +1 @@`
				`export PATH=$PATH:../../../src/bin:../../../tools/openfst/bin:../../../src/fstbin/:../../../src/gmmbin/:../../../src/featbin/:../../../src/fgmmbin:../../../src/sgmmbin`