зеркало из https://github.com/mozilla/kaldi.git
497ca79b56 | ||
---|---|---|
.. | ||
s5 | ||
s5b | ||
s5c | ||
README.txt |
README.txt
About the Switchboard corpus This is conversational telephone speech collected as 2-channel, 8kHz-sampled data. We are using just the Switchboard-1 Phase 1 training data. The catalog number LDC97S62 (Switchboard-1 Release 2) corresponds, we believe, to what we have. We also use the Mississippi State transcriptions, which we download separately from http://www.isip.piconepress.com/projects/switchboard/releases/switchboard_word_alignments.tar.gz We are using the eval2000 a.k.a. hub5'00 evaluation data. The acoustics are LDC2002S09 and the text is LDC2002T43. About the Fisher corpus for language modeling We use Fisher English training speech transcripts for language modeling, if they are available. The catalog number for part 1 transcripts is LDC2004T19, and LDC2005T19 for part 2. Each subdirectory of this directory contains the scripts for a sequence of experiments. s5: This is slightly out of date, please see s5c s5b: This is (somewhat less) out of date, please see s5c s5c: This is the current recipe.