kaldi/egs/swbd
Daniel Povey 497ca79b56 Merge remote-tracking branch 'upstream/master' into chain 2016-04-15 16:30:30 -04:00
..
s5 Merge pull request #681 from kkm000/arpa-invoke-egs 2016-04-13 14:19:24 -07:00
s5b moving the src/path.sh into tools/config/common_path.sh 2016-03-30 09:57:03 -04:00
s5c Merge pull request #666 from api-ai/chain 2016-04-15 13:29:59 -07:00
README.txt trunk: some cosmetic changes; ran maintenance scripts in misc/maintenance/ to fix some broken include guards and update svn:ignore and gitignore properties. 2015-01-22 02:07:09 +00:00

README.txt

About the Switchboard corpus

    This is conversational telephone speech collected as 2-channel, 8kHz-sampled
    data.  We are using just the Switchboard-1 Phase 1 training data.
    The catalog number LDC97S62 (Switchboard-1 Release 2) corresponds, we believe,
    to what we have.  We also use the Mississippi State transcriptions, which
    we download separately from
    http://www.isip.piconepress.com/projects/switchboard/releases/switchboard_word_alignments.tar.gz

    We are using the eval2000 a.k.a. hub5'00 evaluation data.  The acoustics are
    LDC2002S09 and the text is LDC2002T43.

About the Fisher corpus for language modeling

  We use Fisher English training speech transcripts for language modeling, if
  they are available. The catalog number for part 1 transcripts is LDC2004T19,
  and LDC2005T19 for part 2. 

Each subdirectory of this directory contains the
scripts for a sequence of experiments.

  s5: This is slightly out of date, please see s5c

  s5b: This is (somewhat less) out of date, please see s5c

  s5c: This is the current recipe.