зеркало из https://github.com/mozilla/kaldi.git
18 строки
897 B
Plaintext
18 строки
897 B
Plaintext
About the GALE Phase 2 Arabic Broadcast Conversation:
|
|
|
|
LDC2013S02: http://catalog.ldc.upenn.edu/LDC2013S02
|
|
LDC2013S07: http://catalog.ldc.upenn.edu/LDC2013S07
|
|
LDC2013T17: http://catalog.ldc.upenn.edu/LDC2013T17
|
|
LDC2013T04: http://catalog.ldc.upenn.edu/LDC2013T04
|
|
|
|
|
|
GALE Phase 2 Arabic Broadcast Conversation Speech was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 200 hours of Arabic broadcast conversation speech collected in 2006 and 2007 by LDC as part of the DARPA GALE (Global Autonomous Language Exploitation) Program.
|
|
|
|
The data has two types of speech: conversational and report. This script trains and test on all of them and results are reported for each of them, train data is 320 hours, 9.3 hours testing
|
|
|
|
The dictionary, and scripts can be obtained from QCRI portal: http://alt.qcri.org/
|
|
|
|
|
|
s5: The experiments here are based on the above corpus
|
|
|