kaldi/egs
Dan Povey e7ee0537bb Fixes to SGMM training scripts.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@13 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2011-05-15 18:26:18 +00:00
..
rm Fixes to SGMM training scripts. 2011-05-15 18:26:18 +00:00
wsj Script changes (esp. RE SGMMs). 2011-05-15 05:18:18 +00:00
README.txt Committing initial version of Kaldi 2011-05-14 21:48:08 +00:00

README.txt

This directory contains example scripts that demonstrate how to 
use Kaldi.  Each subdirectory corresponds to a corpus that we have
example scripts for.  Currently these are both corpora available from
the Linguistic Data Consortium (LDC).

Explanations of the corpora are below:

 wsj: The Wall Street Journal corpus.  This is a corpus of read
    sentences from the Wall Street Journal, recorded under clean conditions.
    The vocabulary is quite large. 
    Available from the LDC as either: [ catalog numbers LDC93S6A (WSJ0) and LDC94S13A (WSJ1) ]
    or: [ catalog numbers LDC93S6B (WSJ0) and LDC94S13B (WSJ1) ]
    The latter option is cheaper and includes only the Sennheiser
    microphone data (which is all we use in the example scripts).

 rm: Resource Management.  Clean speech in a medium-vocabulary task consisting
    of commands to a (presumably imaginary) computer system.
    Available from the LDC as catalog number LDC93S3A (it may be possible to
    get the same data using combinations of other catalog numbers, but this
    is the one we used).