Граф коммитов

2935 Коммитов

Автор SHA1 Сообщение Дата
David Snyder 1298918014 sandbox/language_id: Adding logistic-regression.conf config file, for setting max_steps and normalizer for the classifier.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3660 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-03 06:48:10 +00:00
Dan Povey c2397e1d3b sandbox/lid: Extending scripts to run on test data.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3641 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-01 00:19:06 +00:00
Dan Povey c437252251 sandbox/language_id: code and script updates for evaluating the model.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3639 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 23:33:06 +00:00
Dan Povey 45bb9d0168 sandbox/language_id: Various improvements to the scripts; more logging in the code; Makefile fix.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3638 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 22:43:17 +00:00
David Snyder 8095c577db sandbox/language_id: Adding util for creating a language to integer id table. This is needed for the logistic regression.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3627 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 16:44:31 +00:00
David Snyder 36c22072f6 sandbox/language_id: Adding util for converting utt2lang to a file in which the lang is an integer label. Also adding a script for running logistic regression binaries, which is currently incomplete.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3626 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 16:43:04 +00:00
David Snyder 5eb0118b72 sandbox/language_id: Removing erroneous changes to run.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3624 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 09:00:05 +00:00
David Snyder a87acef8ad sandbox/language_id: Adding util script for generating a map between language names and an id.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3623 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 08:55:47 +00:00
David Snyder 86142f09d6 sandbox/language_id: Fixing previous commit: Adding logistic-regression-eval.cc and removing the corresponding binary
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3622 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 08:53:01 +00:00
David Snyder 52954e97fc sandbox/language_id: Adding logistic-regression-eval for reading in a model trained by LogisticRegression and writing posteriors and scores.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3621 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 08:50:28 +00:00
David Snyder 684d6f63fb sandbox/language_id: Adding binary logistic-regression-train for training a model on generic training data. However, at the moment this will be primarily used for language-id. Also adding a LogisticRegressionConfig for options.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3619 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 03:24:33 +00:00
David Snyder bb9a9e5a6a sandbox/language_id: Adding LogisticRegression class which uses L-BFGS internally. Logistic regression is the standard classifier for language-id.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3618 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 00:17:27 +00:00
Dan Povey d4d8d230e9 sandbox/language_id: merging trunk, and also committing change to utils/subset_data_dir.sh which I had previously forgotten to commit (necessary for example scripts to work)
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3617 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-27 23:02:24 +00:00
Vimal Manohar c7a26d90e4 trunk/egs/babel: Adding a script steps/nnet2/update_nnet.sh that updates an existing nnet model without initializing it. Minor fix added in script run-6-semisupervised-seg.sh to use update_nnet.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3616 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-27 21:22:40 +00:00
Vimal Manohar 417778ab25 trunk/egs/babel/steps/nnet2: Modified get_lda.sh and train_pnorm.sh to optionally use FMLLR transforms from a directory different from the alignment directory
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3614 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-27 20:28:07 +00:00
Vassil Panayotov 6701a5ca18 trunk/egs: Small fix to steps/make_denlats_nnet.sh (thanks to Feiteng Li)
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3613 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-27 06:10:56 +00:00
Ho Yin Chan 1b53a9c232 trunk:egs readme update on the removal of s5b
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3612 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-27 03:46:35 +00:00
Ho Yin Chan 19107f6556 trunk:src remove since not all data are released
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3611 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-27 03:34:17 +00:00
Ondrej Platek f55e8d0f42 fix '-ldl' error specific to gcc-4.6.3, ubuntu 12.04 by changing fst rule in tools/Makefile
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3609 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-26 23:10:11 +00:00
Dan Povey 5e4f0372ad sandbox/language_id: extension to subset_data.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3608 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-26 22:30:47 +00:00
Dan Povey a9396adaf6 trunk: remove limitation on split not-shared tree building; remove some stuff that was never finished RE AWS; minor changes to run.sh, cosmetic.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3600 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-26 03:06:42 +00:00
Dan Povey cd23ee8ec5 sandbox/language_id: modified script to set up train and test subsets.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3599 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-25 23:29:25 +00:00
Dan Povey c62c4464a0 sandbox/language_id: script bug fix.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3598 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-25 22:34:00 +00:00
Guoguo Chen ca07e7b7ee Added determinization after KxL2xE for proxy keywords; helps the speed, but not fast enough for the 1million lexicon
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3594 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-25 17:48:59 +00:00
Dan Povey f686098e30 Changing training to use subsets initially.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3589 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-25 05:58:48 +00:00
Guoguo Chen 725f4abd68 WARNING: this change list changed GauPost to GaussPost, and also made it pdf-id indexed instead of transition-id indexed. Details: 1) Modified those posterior related programs that use TransitionIdToPdf to get a pdf-id, and later on only use the pdf-id. We merge the posteriors that corresponds to the same pdf-id to avoid redundant computation. 2) Modified phone lattice determinization, added a wrapper for the lattice type determinization to reduce redundant code in the decoding binaries.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3588 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-25 04:00:06 +00:00
Ho Yin Chan cd36df98b5 trunk:src remove unused file
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3587 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-24 22:41:27 +00:00
Dan Povey d529ef5112 sandbox/language_id: merging changes from trunk
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3586 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-24 22:27:50 +00:00
Dan Povey 24310794f9 trunk: minor change to some functions manipulating posteriors, for efficiency; efficiency improvements in apply-cmvn-sliding; cosmetic changes.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3585 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-24 22:16:33 +00:00
David Snyder 90132a6b22 sandbox/language_id: Adding additional LDC training data.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3584 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-24 20:53:42 +00:00
Dan Povey e7c733ad70 sandbox/language_id: further reversion to data-preparation scripts
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3583 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-24 20:47:18 +00:00
Dan Povey d6ad2beb33 sandbox/language_id: fix data-prep script to account for last check-in.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3582 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-24 20:44:45 +00:00
Dan Povey 677a1d1677 sandbox/language_id: rationalize the way the utt2lang is treated.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3581 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-24 20:35:22 +00:00
Vimal Manohar 0db68a27f5 trunk/egs/babel: Bug fix in script for getting examples for semi-supervised training of DNN - get_egs_semi_supervised.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3579 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-24 16:48:39 +00:00
Nagendra Goel 8cd6ba8928 sort order fix
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3577 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-24 04:47:22 +00:00
David Snyder 4ed9fc8c17 sandbox/language_id: In run.sh now combines and uses all sre08 training data
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3573 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-23 17:39:58 +00:00
Vassil Panayotov 0103c446dc trunk/egs/voxforge/s5: Small changes to improve acoustic model training.
Fixes the issues outlined in:
http://sourceforge.net/p/kaldi/discussion/1355348/thread/4db46866/#941f/4f9c/cf07

The test time LM is now built on all available transcripts, including those for
the test utterances, which is not the correct way to evaluate a system, but
for this particular recipe I don't think it matters much.


git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3572 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-23 12:19:49 +00:00
Ho Yin Chan 7ea3dff539 trunk:src not ready, nnet/nnet-cache-tgtmat.h not exist
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3571 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-23 10:15:28 +00:00
Dan Povey a2a1b3d0c3 trunk: renaming --zero-if-disjoint option to --drop-frames in calling scripts (should have been done a long time ago, when I renamed the option)
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3567 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-21 19:52:26 +00:00
Vimal Manohar c191a76f82 trunk/egs/babel: Minor fix in segmentation script generate_segments.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3566 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-21 18:13:33 +00:00
Vimal Manohar 6a2db5e1b7 trunk/egs/babel: Minor change in segmentation.py to take python location from the environment
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3565 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-21 16:41:43 +00:00
Karel Vesely 6d45c140a6 trunk: disabling some tests, which are not ready yet
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3564 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-21 12:39:36 +00:00
Dan Povey b42b9b733f trunk: BABEL: changing default acoustic scale for decoding bottleneck system (thanks to Pegah Ghahremani); minor fix in train_deltas.sh (thanks to Simon Kluepfel)
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3562 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-21 04:19:14 +00:00
David Snyder dff8b5b00f sandbox/language_id: Adding needed sort checks in utils/fix_data_dirs.sh. Previous commit 3560 should've instead read 'adding VAD config file.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3561 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-21 04:17:24 +00:00
David Snyder 5e2f176bdc sandbox/language_id: Adding a needed sort checks in utils/fix_data_dirs.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3560 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-21 04:15:56 +00:00
Dan Povey efb56ad7d7 trunk: Minor, mostly cosmetic fixes
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3559 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-21 03:47:30 +00:00
Vimal Manohar 22285c7d2c trunk/egs/babel: Minor change in segmentation script generate_segments.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3557 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 23:18:52 +00:00
Jan Trmal ebdb816f84 (trunk/babel/s5b) Adding unsup dataset (dataset for unsupervised/semisupervised training)
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3556 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 22:35:14 +00:00
Nagendra Goel cd4482adb0 add missing file
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3555 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 19:55:31 +00:00
Nagendra Goel e4debea29c fix filename
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3554 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-20 18:16:01 +00:00