Dan Povey
|
6f598676cc
|
several nnet2-online changes: make it easier to get the feature extraction options right in cross-system training; add train_pnorm_simple.sh script (simplified learning-rate schedule and improved combination at the end, supersedes train_pnorm_fast.sh); modifying big-data online-nnet2 recipes to use 40-dimensional MFCC rather than 13 as input (will add results soon, but they are improved). Modified filter_scp.pl to have one-based, not zero-based, field index.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@4493 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-09-30 19:18:36 +00:00 |
Dan Povey
|
afb6eb63ec
|
trunk: adding --posterior-scale options in speaker-id and language-id setups... not yet tuned.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@4193 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-07-22 17:43:34 +00:00 |
Dan Povey
|
80299dfdba
|
trunk: merging various changes from sandbox/language_id.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@4180 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-07-20 05:23:18 +00:00 |
Dan Povey
|
5565595718
|
sandbox/language_id: some fixes to run.sh.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@4176 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-07-19 23:15:31 +00:00 |
Dan Povey
|
183f124aa6
|
sandbox/language_id: changing VTLN estimation script to be per-utterance; various minor fixes relating to VTLN estimation.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@4175 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-07-19 23:01:24 +00:00 |
Dan Povey
|
6a91edb723
|
sandbox/language_id: script changes for applying VTLN in language-id; not yet tested.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@4174 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-07-19 21:15:24 +00:00 |
Dan Povey
|
4fd9c20c6a
|
sandbox/language_id: getting VTLN model estimation working given a UBM.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@4173 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-07-19 19:01:45 +00:00 |
Dan Povey
|
83e0e5419e
|
trunk: merging recent changes from ^/sandbox/language_id; removing some svn:mime-type properties on scripts that were preventing merging.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@4033 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-06-01 18:05:22 +00:00 |
David Snyder
|
dbc7ee215c
|
sandbox/language_id: adding alternative diagonal UBM recipe in the run.sh script for lre
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@4010 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-05-22 22:06:42 +00:00 |
David Snyder
|
55c24a43aa
|
sandbox/language_id: Adding scripts to produce the LRE07 General Closed-Set Language Recognition eval. Also fixing a minor bug in run_logistic_regression.sh when rebalancing priors.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@4005 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-05-19 23:11:46 +00:00 |
Dan Povey
|
d68f1be037
|
trunk: adding some scripts that were skipped while merging sandbox/language_id.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@3935 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-04-21 01:56:39 +00:00 |
David Snyder
|
eb8a8aabed
|
sandbox/language_id: In LID setup reversing the order of the CMVN and SDC computation improves WER slightly.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3853 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-04-07 05:43:42 +00:00 |
David Snyder
|
7954900219
|
sandbox/language_id: Adding vad-based utterance splitting scripts in lid setup
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3826 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-04-01 04:12:07 +00:00 |
David Snyder
|
4a2289a5f9
|
sandbox/language-id: Updating logistic regression conf files for mixture model. Also updating run_logistic_regression.sh for new results.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3816 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-29 01:56:01 +00:00 |
David Snyder
|
66f20466ab
|
sandbox/language-id: Adding ivector length normalization to run_logistic_regression.sh. Also updating logistic regression conf with a tuned normalizer value.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3791 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-20 21:14:00 +00:00 |
Dan Povey
|
5c5a4e2f5a
|
sandbox/lid: various script fixes and updates; improving speed of iVector-extractor model loading by parallelizing derived-variables computation.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3759 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-12 04:31:52 +00:00 |
Dan Povey
|
a1a368dc83
|
sandbox/lid: introducing the splitting of long utterances into smaller pieces; various utility script updates.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3757 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-10 04:55:22 +00:00 |
Dan Povey
|
18e1be0067
|
sandbox/lid: data-prep script and run.sh fixes.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3756 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-10 01:53:55 +00:00 |
Dan Povey
|
c52a713602
|
sandbox/language_id: reorganize data preparation scripts.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3754 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-09 22:26:30 +00:00 |
David Snyder
|
06f761d922
|
sandbox/language_id: Fixed a bug causing malformed wav.scp files
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3753 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-09 19:27:26 +00:00 |
David Snyder
|
9f4d83f00e
|
sandbox/language_id: Including spk2gender in new dataprep scripts, without which the datasources cannot be combined.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3752 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-09 16:25:11 +00:00 |
David Snyder
|
0d3f077e8d
|
sandbox/language_id: Bug fix in run.sh; sre08 data prep was incorrectly removed.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3748 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-09 07:39:12 +00:00 |
David Snyder
|
5bfe6a7a55
|
sandbox/language_id: Data prep script for lre03 corpora.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3747 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-09 07:28:33 +00:00 |
David Snyder
|
8c36e3f079
|
sandbox/language_id: Data prep for LRE05 (LDC2008S05).
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3745 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-09 05:09:54 +00:00 |
David Snyder
|
7ce1607b48
|
sandbox/language_id: Adding dataprep script make_callfriend.pl for callfriend corpus
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3744 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-09 02:42:29 +00:00 |
David Snyder
|
4ebba0a69f
|
sandbox/language_id: Fixing typo in run_logistic_regression.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3742 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-08 22:39:25 +00:00 |
David Snyder
|
63368fd6c2
|
sandbox/language_id: Making data set processing more terse in run.sh.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3741 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-08 22:16:55 +00:00 |
David Snyder
|
a1561bb1c4
|
sandbox/language_id: Adding local/make_callfriend.pl for handling the ldc96* datasets and a table for mapping callfriend datasets to languages. With lre07 for testing, run.sh now uses all available training data for training. Bug fix in make_lre07.pl.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3740 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-08 08:49:33 +00:00 |
David Snyder
|
ac7dee24b0
|
sandbox/language_id: Updating run_logistic_regression.sh to use lre07 in test. Also adding a script to remove the dialect portion of the languages in an utt2lang file; this is necessary currently, since the training and testing data often have non-overlapping dialects. Also adding a script balance_priors_to_test.pl which rebalances the priors in the logistic regression model.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3733 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-07 23:14:18 +00:00 |
David Snyder
|
b6d3f76339
|
sandbox/language_id: local/make_lre07.pl creates utt2lang.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3707 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-06 02:01:51 +00:00 |
Dan Povey
|
0c3f034214
|
sandbox/language_id: Adding script to make lre07 data; minor fix to lid/train_ivector_extractor.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3696 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-05 21:34:23 +00:00 |
Dan Povey
|
e5189ebb81
|
sandbox/language_id: some changed defaults.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3687 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-05 04:28:16 +00:00 |
Dan Povey
|
8728d35d9a
|
Various updates to iVector setup; added possibility to modify priors.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3686 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-05 03:46:34 +00:00 |
Dan Povey
|
3335ff8df4
|
sandbox/language_id: fixes to data preparation scripts
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3682 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-05 01:18:02 +00:00 |
Dan Povey
|
89d4e1c13b
|
sandbox/language_id: partial fix to data-prep script
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3681 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-04 23:05:45 +00:00 |
David Snyder
|
1298918014
|
sandbox/language_id: Adding logistic-regression.conf config file, for setting max_steps and normalizer for the classifier.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3660 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-03 06:48:10 +00:00 |
Dan Povey
|
c2397e1d3b
|
sandbox/lid: Extending scripts to run on test data.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3641 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-03-01 00:19:06 +00:00 |
Dan Povey
|
c437252251
|
sandbox/language_id: code and script updates for evaluating the model.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3639 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-02-28 23:33:06 +00:00 |
Dan Povey
|
45bb9d0168
|
sandbox/language_id: Various improvements to the scripts; more logging in the code; Makefile fix.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3638 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-02-28 22:43:17 +00:00 |
David Snyder
|
36c22072f6
|
sandbox/language_id: Adding util for converting utt2lang to a file in which the lang is an integer label. Also adding a script for running logistic regression binaries, which is currently incomplete.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3626 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-02-28 16:43:04 +00:00 |
David Snyder
|
5eb0118b72
|
sandbox/language_id: Removing erroneous changes to run.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3624 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-02-28 09:00:05 +00:00 |
David Snyder
|
a87acef8ad
|
sandbox/language_id: Adding util script for generating a map between language names and an id.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3623 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-02-28 08:55:47 +00:00 |
Dan Povey
|
5e4f0372ad
|
sandbox/language_id: extension to subset_data.sh
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3608 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-02-26 22:30:47 +00:00 |
Dan Povey
|
cd23ee8ec5
|
sandbox/language_id: modified script to set up train and test subsets.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3599 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-02-25 23:29:25 +00:00 |
Dan Povey
|
c62c4464a0
|
sandbox/language_id: script bug fix.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3598 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-02-25 22:34:00 +00:00 |
Dan Povey
|
f686098e30
|
Changing training to use subsets initially.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3589 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-02-25 05:58:48 +00:00 |
David Snyder
|
90132a6b22
|
sandbox/language_id: Adding additional LDC training data.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3584 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-02-24 20:53:42 +00:00 |
Dan Povey
|
d6ad2beb33
|
sandbox/language_id: fix data-prep script to account for last check-in.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3582 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-02-24 20:44:45 +00:00 |
Dan Povey
|
677a1d1677
|
sandbox/language_id: rationalize the way the utt2lang is treated.
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3581 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-02-24 20:35:22 +00:00 |
Nagendra Goel
|
8cd6ba8928
|
sort order fix
git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/language_id@3577 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
|
2014-02-24 04:47:22 +00:00 |