Граф коммитов

2881 Коммитов

Автор SHA1 Сообщение Дата
Dan Povey b42b9b733f trunk: BABEL: changing default acoustic scale for decoding bottleneck system (thanks to Pegah Ghahremani); minor fix in train_deltas.sh (thanks to Simon Kluepfel)
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3562 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-21 04:19:14 +00:00
Dan Povey efb56ad7d7 trunk: Minor, mostly cosmetic fixes
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3559 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-21 03:47:30 +00:00
Vimal Manohar 22285c7d2c trunk/egs/babel: Minor change in segmentation script generate_segments.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3557 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 23:18:52 +00:00
Jan Trmal ebdb816f84 (trunk/babel/s5b) Adding unsup dataset (dataset for unsupervised/semisupervised training)
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3556 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 22:35:14 +00:00
Vimal Manohar 6f388b925d trunk: minor fix to scripts train_pnorm.sh and train_pnorn_ensemble.sh to work with non-GMM input model
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3553 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 18:07:31 +00:00
Jan Trmal 9692cf08a5 (trunk/babel/s5b) Fix the bug accidentaly introduced when renaming the bnf directories (removing the _app) suffix
Because the param was stored in the plp directory and the way the param names 
are constructed, this lead to overwriting the PLP parametrization of the same 
dataset. Now the BNF param is stored in the directory param_bnf (plp_bnf seemed
silly) 


git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3552 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 17:51:10 +00:00
Karel Vesely a6688e9daa trunk,nnet: updating the codebase
- removing 'Cache' class, replaced by 'Randomizer'
- removing obsolete training tools using 'Cache' class
- adding 2d convolutional components from Harish
- removing scaled inv-prior-weighted cross-entropy Class, not helpful




git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3551 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 10:55:18 +00:00
Vimal Manohar 3524adaac2 (trunk)
Updating semi-supervised DNN training scripts and associated programs in trunk.


git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3550 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 07:28:08 +00:00
Xiaohui Zhang 7318249685 fixed a typo in run-2a-nnet-ensemble.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3549 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 06:44:10 +00:00
Jan Trmal d1eb28997b (trunk/babel/s4b) Added support for data sources specified using multiple locations and filelists
This tremendously simplifies handling of untranscribed data and the shadow set processing
In the longer term, we will be able to remove the shadow-set specific code paths in decoding.
For example, how to specify the  data source using multple locations, see the example config
that is part of this commit


git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3546 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 06:07:12 +00:00
Dan Povey 717ad1c436 trunk: Change name of program sum-vecs to vector-sum (was not used, but is now needed; new name is more discoverable). Minor changes in usage messages
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3541 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-19 00:08:45 +00:00
Vimal Manohar d081efc2be (trunk)
Adding scripts and source files for semi-supervised DNN training to trunk 
The scripts affect only the Babel recipe.
Adds additional programs in src/bin and src/nnet2bin to get examples
for semi-supervised training.


git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3540 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 23:58:34 +00:00
Jan Trmal f9c8bdc451 (trunk/babel/s5b) Additional fixes in the new decoding pipeline +improvements for the combination scripts.
The kws_combine.sh now supports supplying individual weights of the systems (before this change, all the systems
were taken with the same weight). The score_combine.sh now properly filters <hes> tag (which is marked optionable
deletable in the eval documentation.



git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3538 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 21:06:20 +00:00
Jan Trmal 228a61d486 (trunk/babel/s5b) Mostly fixes to the new decoding script. New feature: added OOV search.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3537 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 20:54:57 +00:00
Dan Povey 7ea7b9f2c7 trunk: Script fix in train_quick.sh (thanks to Simon Klupfel); and clarifications about how to turn off CMN in scripts.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3536 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 19:40:09 +00:00
Vimal Manohar 1e2af4f0ef Segmentation scripts
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3535 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 03:40:48 +00:00
Vimal Manohar aa0eeb18bd Reverting segmentation changes
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3534 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 03:31:56 +00:00
Dan Povey af3a04d9df Trunk: reverting revision 3528 (should have been to sandbox)
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3530 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 00:02:07 +00:00
Dan Povey 6b9a20d2cf Trunk: reverting revision 3527 (should have been to sandbox)J
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3529 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 00:00:24 +00:00
Ondrej Platek 94eb5ec4f0 sandbox/oplatek2 LatticeFasterDecoder private->protected
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3528 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-17 23:59:29 +00:00
Ondrej Platek d028076e4d sandbox/oplatek added simple online C++ interface in dec-wrap
and python wrapper of dec-wrap in pykaldi

git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3527 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-17 23:31:56 +00:00
Guoguo Chen 21d64ecb6c Fixes to proxy keywords generation
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3526 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-17 23:29:21 +00:00
Jan Trmal 6a24888ae8 (trunk/babel/s5b) First version of the Kaldi-Babel native UEM segmentation
The support for it had been already included into the 4-anydecode script


git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3522 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-17 19:54:46 +00:00
Jan Trmal 2ecab0e583 (trunk/babel/s5b) First version of the improved decoding pipeline. Does not handle the OOV search yet.
For the time being, it supports the PEM and UEM (from CMU) segmentation. Moreover, it includes the
SEG segmentation (Kaldi-BABEL native), which is currently being finalized
Another changes is the ability skipping extra features (especially related to search). This is
useful as the "quick" baseline can be run routinely and the "extra" only when needed.



git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3521 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-17 19:36:38 +00:00
Dan Povey 0086a5e1c1 trunk: small commit with change in documentation, to test if committing works right now.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3520 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-17 19:20:20 +00:00
David Snyder 2a54ad8f4d trunk: for language-id setup adding an implementation of Shifted Delta Cepstra (SDC) features and corresponding unit tests.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3519 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-16 23:07:21 +00:00
Guoguo Chen ed961d3238 Various improvements to proxy keywords generation, both speech-wise and performance-wise
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3518 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-16 20:45:27 +00:00
Xiaohui Zhang bb0109e3cb changed learning rates and final beta value for the ensemble training recipe.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3516 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-15 05:34:25 +00:00
Guoguo Chen 58ee5aa974 Move lexicon extension configs to common config file... this is cleaner
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3515 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-15 04:36:35 +00:00
Guoguo Chen 86d6e440cf Top-level script for extending the training lexicon
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3514 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-15 04:15:15 +00:00
Dan Povey 9f1d876a6c trunk: compilation fix
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3513 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-15 04:10:59 +00:00
Jan Trmal 2c1f966453 (trunk/babel/s5b) Adding phoneme merging to the Zulu LimitedLP lang config.
The merge improves WER by about 0.5 % abs and the ATWV by about 0.3 % abs


git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3512 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-14 19:48:47 +00:00
Guoguo Chen a28834835a Fixes to extend_lexicon.sh: 1. added OOV rate computation as diagnostic information, if a text file is provided; 2. extended phone mapping repository to a..zA..Z0..9; 3. changed num_sent_gen as the total number of sentences to generate; 4. added encoding when processing words
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3511 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-13 17:53:11 +00:00
Jan Trmal 54ec319890 (trunk/babel/s5b) Adding documentation to the confusion matrix generation script
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3510 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-13 06:29:24 +00:00
Jan Trmal d7fd8b6d48 (babel/trunk/s5b) Improvement to the combination. Now it better sweeps the power interval (0,1>
so we have the same resolution during the whole interval (which wasnt the case when using inv-power,
where the resolution was reciprocical )


git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3509 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-13 01:42:53 +00:00
Jan Trmal 0fe906732e (trunk/babel/s5b) Adding a script for confusion matrix generation (for use, e.g. with the proxy kw search)
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3508 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 23:59:32 +00:00
Jan Trmal 810503debb (trunk/babel/s5b) Bugfixes and small improvements to the G2P scripts.
The interface is regarded stable for now.


git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3507 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 23:51:39 +00:00
Jan Trmal c9231dc606 In the postprocessing, remove the "<hes>" tag, as per the BABEL eval documentation,
this flag is "optionably deletable", i.e. the penalty will be incurred only for
a stray insertion, not  a stray deletion. This gives approximately 0.2% WER 
improvement on Zulu


git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3506 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 23:41:14 +00:00
Nickolay V. Shmyrev 6e5003b92f trunk: endianess in the sphinx feature dumper, now feature files are correct
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3505 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 23:24:16 +00:00
Guoguo Chen da61f54d76 Changes to lower level scripts for proxy keywords: 1. decomposed generate_proxy_keywords.sh into a language dependent part and language independent part; 2. improved the ATWV performance
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3504 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 21:47:50 +00:00
Karel Vesely 7bdcba29eb trunk,nnet: minor update,
* analyze-counts refactored, accepts symbol table to print counts
* nnet-loss.cc - added sanity check to dim in input posteriors



git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3503 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 16:38:51 +00:00
Jan Trmal 01513ca780 (trunk/babel/s5b) Ignore the return code of the cleanup code. Othervise the 'set -e' will cause failing the script after rm returns non-zero status code
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3502 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 02:20:22 +00:00
Jan Trmal 5920437594 (trunk/babel/s5b) Reverting the previous commit as it was including other changes not related to the subject of the commit
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3501 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 01:43:10 +00:00
Jan Trmal ab2efdde6f (trunk/babel/s5b) Reverting the previous commit as it was including other changes not related to the subject of the commit
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3500 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 01:35:21 +00:00
Jan Trmal e78cbddfcf (trunk/babel/s5b) For the UEM segmentation, allow for use of alternative extensions of the datadir.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3499 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 01:19:00 +00:00
Jan Trmal a7a6faeb3f (trunk/doc) Changing documentation about getting Kaldi. SF recently seems to disallowing the svn URI access, so we suggest the https one
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3497 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-11 02:54:37 +00:00
Dan Povey 1becbbe04a trunk: Changing openfst failover in tools/Makefile to point to my website (since it was currently failing over to a different alias of the same site)
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3496 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-11 02:15:03 +00:00
Jan Trmal 401532c389 (trunk/babel/s5) Adding lexicon filtering. Affects FullLP case only.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3494 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-10 18:50:12 +00:00
Jan Trmal 247be4f136 (trunk/babel/s5) Replace the --zero-if-disjoint by --drop-frames parameter
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3493 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-10 18:43:34 +00:00
Karel Vesely 2344573977 trunk: HTK conversion tool, adding support for 3column arcs, representing arc with implicit "unit weight" equivalent to "0,0,".
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3492 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-10 13:17:06 +00:00