зеркало из https://github.com/mozilla/kaldi.git
c8bf123702 | ||
---|---|---|
.. | ||
s5 | ||
README.txt |
README.txt
About the Wall Street Journal corpus: This is a corpus of read sentences from the Wall Street Journal, recorded under clean conditions. The vocabulary is quite large. About 80 hours of training data. Available from the LDC as either: [ catalog numbers LDC93S6A (WSJ0) and LDC94S13A (WSJ1) ] or: [ catalog numbers LDC93S6B (WSJ0) and LDC94S13B (WSJ1) ] The latter option is cheaper and includes only the Sennheiser microphone data (which is all we use in the example scripts). Each subdirectory of this directory contains the scripts for a sequence of experiments. [note: most of the older example scripts have been deleted, but are still available at ^/branches/complete]. s5: This is the current recommended recipe.