Граф коммитов

59 Коммитов

Автор SHA1 Сообщение Дата
Dan Povey 6f598676cc several nnet2-online changes: make it easier to get the feature extraction options right in cross-system training; add train_pnorm_simple.sh script (simplified learning-rate schedule and improved combination at the end, supersedes train_pnorm_fast.sh); modifying big-data online-nnet2 recipes to use 40-dimensional MFCC rather than 13 as input (will add results soon, but they are improved). Modified filter_scp.pl to have one-based, not zero-based, field index.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@4493 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-09-30 19:18:36 +00:00
Dan Povey afb6eb63ec trunk: adding --posterior-scale options in speaker-id and language-id setups... not yet tuned.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@4193 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-07-22 17:43:34 +00:00
Dan Povey 80299dfdba trunk: merging various changes from sandbox/language_id.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@4180 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-07-20 05:23:18 +00:00
Dan Povey 5565595718 sandbox/language_id: some fixes to run.sh.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@4176 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-07-19 23:15:31 +00:00
Dan Povey 183f124aa6 sandbox/language_id: changing VTLN estimation script to be per-utterance; various minor fixes relating to VTLN estimation.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@4175 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-07-19 23:01:24 +00:00
Dan Povey 6a91edb723 sandbox/language_id: script changes for applying VTLN in language-id; not yet tested.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@4174 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-07-19 21:15:24 +00:00
Dan Povey 4fd9c20c6a sandbox/language_id: getting VTLN model estimation working given a UBM.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@4173 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-07-19 19:01:45 +00:00
Dan Povey 83e0e5419e trunk: merging recent changes from ^/sandbox/language_id; removing some svn:mime-type properties on scripts that were preventing merging.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@4033 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-06-01 18:05:22 +00:00
David Snyder dbc7ee215c sandbox/language_id: adding alternative diagonal UBM recipe in the run.sh script for lre
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@4010 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-05-22 22:06:42 +00:00
David Snyder 55c24a43aa sandbox/language_id: Adding scripts to produce the LRE07 General Closed-Set Language Recognition eval. Also fixing a minor bug in run_logistic_regression.sh when rebalancing priors.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@4005 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-05-19 23:11:46 +00:00
Dan Povey d68f1be037 trunk: adding some scripts that were skipped while merging sandbox/language_id.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3935 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-04-21 01:56:39 +00:00
David Snyder eb8a8aabed sandbox/language_id: In LID setup reversing the order of the CMVN and SDC computation improves WER slightly.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3853 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-04-07 05:43:42 +00:00
David Snyder 7954900219 sandbox/language_id: Adding vad-based utterance splitting scripts in lid setup
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3826 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-04-01 04:12:07 +00:00
David Snyder 4a2289a5f9 sandbox/language-id: Updating logistic regression conf files for mixture model. Also updating run_logistic_regression.sh for new results.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3816 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-29 01:56:01 +00:00
David Snyder 66f20466ab sandbox/language-id: Adding ivector length normalization to run_logistic_regression.sh. Also updating logistic regression conf with a tuned normalizer value.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3791 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-20 21:14:00 +00:00
Dan Povey 5c5a4e2f5a sandbox/lid: various script fixes and updates; improving speed of iVector-extractor model loading by parallelizing derived-variables computation.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3759 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-12 04:31:52 +00:00
Dan Povey a1a368dc83 sandbox/lid: introducing the splitting of long utterances into smaller pieces; various utility script updates.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3757 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-10 04:55:22 +00:00
Dan Povey 18e1be0067 sandbox/lid: data-prep script and run.sh fixes.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3756 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-10 01:53:55 +00:00
Dan Povey c52a713602 sandbox/language_id: reorganize data preparation scripts.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3754 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-09 22:26:30 +00:00
David Snyder 06f761d922 sandbox/language_id: Fixed a bug causing malformed wav.scp files
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3753 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-09 19:27:26 +00:00
David Snyder 9f4d83f00e sandbox/language_id: Including spk2gender in new dataprep scripts, without which the datasources cannot be combined.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3752 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-09 16:25:11 +00:00
David Snyder 0d3f077e8d sandbox/language_id: Bug fix in run.sh; sre08 data prep was incorrectly removed.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3748 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-09 07:39:12 +00:00
David Snyder 5bfe6a7a55 sandbox/language_id: Data prep script for lre03 corpora.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3747 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-09 07:28:33 +00:00
David Snyder 8c36e3f079 sandbox/language_id: Data prep for LRE05 (LDC2008S05).
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3745 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-09 05:09:54 +00:00
David Snyder 7ce1607b48 sandbox/language_id: Adding dataprep script make_callfriend.pl for callfriend corpus
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3744 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-09 02:42:29 +00:00
David Snyder 4ebba0a69f sandbox/language_id: Fixing typo in run_logistic_regression.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3742 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-08 22:39:25 +00:00
David Snyder 63368fd6c2 sandbox/language_id: Making data set processing more terse in run.sh.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3741 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-08 22:16:55 +00:00
David Snyder a1561bb1c4 sandbox/language_id: Adding local/make_callfriend.pl for handling the ldc96* datasets and a table for mapping callfriend datasets to languages. With lre07 for testing, run.sh now uses all available training data for training. Bug fix in make_lre07.pl.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3740 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-08 08:49:33 +00:00
David Snyder ac7dee24b0 sandbox/language_id: Updating run_logistic_regression.sh to use lre07 in test. Also adding a script to remove the dialect portion of the languages in an utt2lang file; this is necessary currently, since the training and testing data often have non-overlapping dialects. Also adding a script balance_priors_to_test.pl which rebalances the priors in the logistic regression model.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3733 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-07 23:14:18 +00:00
David Snyder b6d3f76339 sandbox/language_id: local/make_lre07.pl creates utt2lang.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3707 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-06 02:01:51 +00:00
Dan Povey 0c3f034214 sandbox/language_id: Adding script to make lre07 data; minor fix to lid/train_ivector_extractor.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3696 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-05 21:34:23 +00:00
Dan Povey e5189ebb81 sandbox/language_id: some changed defaults.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3687 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-05 04:28:16 +00:00
Dan Povey 8728d35d9a Various updates to iVector setup; added possibility to modify priors.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3686 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-05 03:46:34 +00:00
Dan Povey 3335ff8df4 sandbox/language_id: fixes to data preparation scripts
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3682 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-05 01:18:02 +00:00
Dan Povey 89d4e1c13b sandbox/language_id: partial fix to data-prep script
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3681 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-04 23:05:45 +00:00
David Snyder 1298918014 sandbox/language_id: Adding logistic-regression.conf config file, for setting max_steps and normalizer for the classifier.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3660 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-03 06:48:10 +00:00
Dan Povey c2397e1d3b sandbox/lid: Extending scripts to run on test data.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3641 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-03-01 00:19:06 +00:00
Dan Povey c437252251 sandbox/language_id: code and script updates for evaluating the model.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3639 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 23:33:06 +00:00
Dan Povey 45bb9d0168 sandbox/language_id: Various improvements to the scripts; more logging in the code; Makefile fix.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3638 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 22:43:17 +00:00
David Snyder 36c22072f6 sandbox/language_id: Adding util for converting utt2lang to a file in which the lang is an integer label. Also adding a script for running logistic regression binaries, which is currently incomplete.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3626 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 16:43:04 +00:00
David Snyder 5eb0118b72 sandbox/language_id: Removing erroneous changes to run.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3624 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 09:00:05 +00:00
David Snyder a87acef8ad sandbox/language_id: Adding util script for generating a map between language names and an id.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3623 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-28 08:55:47 +00:00
Dan Povey 5e4f0372ad sandbox/language_id: extension to subset_data.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3608 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-26 22:30:47 +00:00
Dan Povey cd23ee8ec5 sandbox/language_id: modified script to set up train and test subsets.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3599 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-25 23:29:25 +00:00
Dan Povey c62c4464a0 sandbox/language_id: script bug fix.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3598 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-25 22:34:00 +00:00
Dan Povey f686098e30 Changing training to use subsets initially.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3589 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-25 05:58:48 +00:00
David Snyder 90132a6b22 sandbox/language_id: Adding additional LDC training data.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3584 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-24 20:53:42 +00:00
Dan Povey d6ad2beb33 sandbox/language_id: fix data-prep script to account for last check-in.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3582 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-24 20:44:45 +00:00
Dan Povey 677a1d1677 sandbox/language_id: rationalize the way the utt2lang is treated.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3581 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-24 20:35:22 +00:00
Nagendra Goel 8cd6ba8928 sort order fix
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3577 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2014-02-24 04:47:22 +00:00