kaldi

История

Dan Povey 45e4d2f0a0 Minor fixes to SGMM training. git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@1362 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8		2012-09-17 18:21:40 +00:00
..
gp	git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@1347 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8	2012-09-13 13:23:32 +00:00
rm	Changes to READMEs	2012-09-16 17:34:52 +00:00
swbd	Minor fixes to SGMM training.	2012-09-17 18:21:40 +00:00
tidigits	Merging various small script changes from sandbox	2012-09-09 19:29:17 +00:00
timit	sync with trunk	2012-04-17 11:18:27 +00:00
voxforge	Cosmetic changes in egs/voxforge/online_demo	2012-08-25 08:25:49 +00:00
wsj	Changes to READMEs	2012-09-16 17:34:52 +00:00
yesno/s3	Changed the code to use the "pruned" lattice-determinization-- avoid the blowup that sometimes happens.	2012-05-17 22:21:31 +00:00
README.txt	An entry for egs/voxforge added in egs/README.txt (under WIP recipes)	2012-07-11 08:24:26 +00:00

README.txt

This directory contains example scripts that demonstrate how to
use Kaldi. Each subdirectory corresponds to a corpus that we have
example scripts for. Currently these are all corpora available from
the Linguistic Data Consortium (LDC).

Explanations of the corpora are below.
Note: the easiest examples to work with are rm/s3 and wsj/s3.

wsj: The Wall Street Journal corpus. This is a corpus of read
sentences from the Wall Street Journal, recorded under clean conditions.
The vocabulary is quite large.
Available from the LDC as either: [ catalog numbers LDC93S6A (WSJ0) and LDC94S13A (WSJ1) ]
or: [ catalog numbers LDC93S6B (WSJ0) and LDC94S13B (WSJ1) ]
The latter option is cheaper and includes only the Sennheiser
microphone data (which is all we use in the example scripts).

rm: Resource Management. Clean speech in a medium-vocabulary task consisting
of commands to a (presumably imaginary) computer system.
Available from the LDC as catalog number LDC93S3A (it may be possible to
get the same data using combinations of other catalog numbers, but this
is the one we used).

tidigits: The TI Digits database, available from the LDC (catalog number LDC93S10).
This is one of the oldest speech databases; it consists of a bunch of speakers
saying digit strings. It's not considered a "real" task any more, but can be useful
for demos, tutorials, and the like.

yesno: This is a simple recipe with some data consisting of a single person
saying the words "yes" and "no", that can be downloaded from the Kaldi website.
It's a very easy task, but useful for checking that the scripts run, or if
you don't yet have any of the LDC data.

Recipes in progress (these may be less polished than the ones above).

swbd: Switchboard. A fairly large amount of telephone speech (2-channel, 8kHz
sampling rate).
This directory is a work in progress.

gp: GlobalPhone. This is a multilingual speech corpus.

timit: TIMIT, which is an old corpus of carefully read speech.
LDC corpous LDC93S1

voxforge: A recipe for the free speech data available from voxforge.org