Dan Povey
b42b9b733f
trunk: BABEL: changing default acoustic scale for decoding bottleneck system (thanks to Pegah Ghahremani); minor fix in train_deltas.sh (thanks to Simon Kluepfel)
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3562 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-21 04:19:14 +00:00
Dan Povey
efb56ad7d7
trunk: Minor, mostly cosmetic fixes
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3559 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-21 03:47:30 +00:00
Vimal Manohar
22285c7d2c
trunk/egs/babel: Minor change in segmentation script generate_segments.sh
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3557 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 23:18:52 +00:00
Jan Trmal
ebdb816f84
(trunk/babel/s5b) Adding unsup dataset (dataset for unsupervised/semisupervised training)
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3556 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 22:35:14 +00:00
Vimal Manohar
6f388b925d
trunk: minor fix to scripts train_pnorm.sh and train_pnorn_ensemble.sh to work with non-GMM input model
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3553 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 18:07:31 +00:00
Jan Trmal
9692cf08a5
(trunk/babel/s5b) Fix the bug accidentaly introduced when renaming the bnf directories (removing the _app) suffix
...
Because the param was stored in the plp directory and the way the param names
are constructed, this lead to overwriting the PLP parametrization of the same
dataset. Now the BNF param is stored in the directory param_bnf (plp_bnf seemed
silly)
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3552 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 17:51:10 +00:00
Karel Vesely
a6688e9daa
trunk,nnet: updating the codebase
...
- removing 'Cache' class, replaced by 'Randomizer'
- removing obsolete training tools using 'Cache' class
- adding 2d convolutional components from Harish
- removing scaled inv-prior-weighted cross-entropy Class, not helpful
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3551 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 10:55:18 +00:00
Vimal Manohar
3524adaac2
(trunk)
...
Updating semi-supervised DNN training scripts and associated programs in trunk.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3550 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 07:28:08 +00:00
Xiaohui Zhang
7318249685
fixed a typo in run-2a-nnet-ensemble.sh
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3549 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 06:44:10 +00:00
Jan Trmal
d1eb28997b
(trunk/babel/s4b) Added support for data sources specified using multiple locations and filelists
...
This tremendously simplifies handling of untranscribed data and the shadow set processing
In the longer term, we will be able to remove the shadow-set specific code paths in decoding.
For example, how to specify the data source using multple locations, see the example config
that is part of this commit
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3546 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 06:07:12 +00:00
Dan Povey
717ad1c436
trunk: Change name of program sum-vecs to vector-sum (was not used, but is now needed; new name is more discoverable). Minor changes in usage messages
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3541 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-19 00:08:45 +00:00
Vimal Manohar
d081efc2be
(trunk)
...
Adding scripts and source files for semi-supervised DNN training to trunk
The scripts affect only the Babel recipe.
Adds additional programs in src/bin and src/nnet2bin to get examples
for semi-supervised training.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3540 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 23:58:34 +00:00
Jan Trmal
f9c8bdc451
(trunk/babel/s5b) Additional fixes in the new decoding pipeline +improvements for the combination scripts.
...
The kws_combine.sh now supports supplying individual weights of the systems (before this change, all the systems
were taken with the same weight). The score_combine.sh now properly filters <hes> tag (which is marked optionable
deletable in the eval documentation.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3538 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 21:06:20 +00:00
Jan Trmal
228a61d486
(trunk/babel/s5b) Mostly fixes to the new decoding script. New feature: added OOV search.
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3537 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 20:54:57 +00:00
Dan Povey
7ea7b9f2c7
trunk: Script fix in train_quick.sh (thanks to Simon Klupfel); and clarifications about how to turn off CMN in scripts.
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3536 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 19:40:09 +00:00
Vimal Manohar
1e2af4f0ef
Segmentation scripts
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3535 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 03:40:48 +00:00
Vimal Manohar
aa0eeb18bd
Reverting segmentation changes
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3534 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 03:31:56 +00:00
Dan Povey
af3a04d9df
Trunk: reverting revision 3528 (should have been to sandbox)
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3530 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 00:02:07 +00:00
Dan Povey
6b9a20d2cf
Trunk: reverting revision 3527 (should have been to sandbox)J
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3529 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-18 00:00:24 +00:00
Ondrej Platek
94eb5ec4f0
sandbox/oplatek2 LatticeFasterDecoder private->protected
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3528 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-17 23:59:29 +00:00
Ondrej Platek
d028076e4d
sandbox/oplatek added simple online C++ interface in dec-wrap
...
and python wrapper of dec-wrap in pykaldi
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3527 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-17 23:31:56 +00:00
Guoguo Chen
21d64ecb6c
Fixes to proxy keywords generation
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3526 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-17 23:29:21 +00:00
Jan Trmal
6a24888ae8
(trunk/babel/s5b) First version of the Kaldi-Babel native UEM segmentation
...
The support for it had been already included into the 4-anydecode script
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3522 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-17 19:54:46 +00:00
Jan Trmal
2ecab0e583
(trunk/babel/s5b) First version of the improved decoding pipeline. Does not handle the OOV search yet.
...
For the time being, it supports the PEM and UEM (from CMU) segmentation. Moreover, it includes the
SEG segmentation (Kaldi-BABEL native), which is currently being finalized
Another changes is the ability skipping extra features (especially related to search). This is
useful as the "quick" baseline can be run routinely and the "extra" only when needed.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3521 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-17 19:36:38 +00:00
Dan Povey
0086a5e1c1
trunk: small commit with change in documentation, to test if committing works right now.
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3520 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-17 19:20:20 +00:00
David Snyder
2a54ad8f4d
trunk: for language-id setup adding an implementation of Shifted Delta Cepstra (SDC) features and corresponding unit tests.
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3519 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-16 23:07:21 +00:00
Guoguo Chen
ed961d3238
Various improvements to proxy keywords generation, both speech-wise and performance-wise
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3518 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-16 20:45:27 +00:00
Xiaohui Zhang
bb0109e3cb
changed learning rates and final beta value for the ensemble training recipe.
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3516 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-15 05:34:25 +00:00
Guoguo Chen
58ee5aa974
Move lexicon extension configs to common config file... this is cleaner
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3515 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-15 04:36:35 +00:00
Guoguo Chen
86d6e440cf
Top-level script for extending the training lexicon
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3514 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-15 04:15:15 +00:00
Dan Povey
9f1d876a6c
trunk: compilation fix
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3513 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-15 04:10:59 +00:00
Jan Trmal
2c1f966453
(trunk/babel/s5b) Adding phoneme merging to the Zulu LimitedLP lang config.
...
The merge improves WER by about 0.5 % abs and the ATWV by about 0.3 % abs
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3512 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-14 19:48:47 +00:00
Guoguo Chen
a28834835a
Fixes to extend_lexicon.sh: 1. added OOV rate computation as diagnostic information, if a text file is provided; 2. extended phone mapping repository to a..zA..Z0..9; 3. changed num_sent_gen as the total number of sentences to generate; 4. added encoding when processing words
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3511 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-13 17:53:11 +00:00
Jan Trmal
54ec319890
(trunk/babel/s5b) Adding documentation to the confusion matrix generation script
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3510 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-13 06:29:24 +00:00
Jan Trmal
d7fd8b6d48
(babel/trunk/s5b) Improvement to the combination. Now it better sweeps the power interval (0,1>
...
so we have the same resolution during the whole interval (which wasnt the case when using inv-power,
where the resolution was reciprocical )
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3509 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-13 01:42:53 +00:00
Jan Trmal
0fe906732e
(trunk/babel/s5b) Adding a script for confusion matrix generation (for use, e.g. with the proxy kw search)
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3508 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 23:59:32 +00:00
Jan Trmal
810503debb
(trunk/babel/s5b) Bugfixes and small improvements to the G2P scripts.
...
The interface is regarded stable for now.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3507 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 23:51:39 +00:00
Jan Trmal
c9231dc606
In the postprocessing, remove the "<hes>" tag, as per the BABEL eval documentation,
...
this flag is "optionably deletable", i.e. the penalty will be incurred only for
a stray insertion, not a stray deletion. This gives approximately 0.2% WER
improvement on Zulu
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3506 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 23:41:14 +00:00
Nickolay V. Shmyrev
6e5003b92f
trunk: endianess in the sphinx feature dumper, now feature files are correct
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3505 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 23:24:16 +00:00
Guoguo Chen
da61f54d76
Changes to lower level scripts for proxy keywords: 1. decomposed generate_proxy_keywords.sh into a language dependent part and language independent part; 2. improved the ATWV performance
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3504 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 21:47:50 +00:00
Karel Vesely
7bdcba29eb
trunk,nnet: minor update,
...
* analyze-counts refactored, accepts symbol table to print counts
* nnet-loss.cc - added sanity check to dim in input posteriors
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3503 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 16:38:51 +00:00
Jan Trmal
01513ca780
(trunk/babel/s5b) Ignore the return code of the cleanup code. Othervise the 'set -e' will cause failing the script after rm returns non-zero status code
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3502 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 02:20:22 +00:00
Jan Trmal
5920437594
(trunk/babel/s5b) Reverting the previous commit as it was including other changes not related to the subject of the commit
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3501 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 01:43:10 +00:00
Jan Trmal
ab2efdde6f
(trunk/babel/s5b) Reverting the previous commit as it was including other changes not related to the subject of the commit
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3500 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 01:35:21 +00:00
Jan Trmal
e78cbddfcf
(trunk/babel/s5b) For the UEM segmentation, allow for use of alternative extensions of the datadir.
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3499 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-12 01:19:00 +00:00
Jan Trmal
a7a6faeb3f
(trunk/doc) Changing documentation about getting Kaldi. SF recently seems to disallowing the svn URI access, so we suggest the https one
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3497 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-11 02:54:37 +00:00
Dan Povey
1becbbe04a
trunk: Changing openfst failover in tools/Makefile to point to my website (since it was currently failing over to a different alias of the same site)
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3496 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-11 02:15:03 +00:00
Jan Trmal
401532c389
(trunk/babel/s5) Adding lexicon filtering. Affects FullLP case only.
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3494 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-10 18:50:12 +00:00
Jan Trmal
247be4f136
(trunk/babel/s5) Replace the --zero-if-disjoint by --drop-frames parameter
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3493 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-10 18:43:34 +00:00
Karel Vesely
2344573977
trunk: HTK conversion tool, adding support for 3column arcs, representing arc with implicit "unit weight" equivalent to "0,0,".
...
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3492 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-10 13:17:06 +00:00