Tilman Kamp
b4a0e3bd85
Catalog tool
2020-02-17 18:27:41 +01:00
Tilman Kamp
06ab3a2d17
Meta data output independent of sample output format
2020-02-17 13:47:11 +01:00
Tilman Kamp
1383951191
Fix --no-progress option; better info logs
2020-02-17 12:34:56 +01:00
Tilman Kamp
aacbda1676
Ability to drop samples with unknown and/or multi-instance meta data
2020-02-17 11:41:55 +01:00
Tilman Kamp
3342067a0b
Meta data output on SDB export
2020-02-14 18:29:50 +01:00
Tilman Kamp
42556e1988
Late binding of opuslib, nicer progress bars on export
2020-02-13 12:21:25 +01:00
Tilman Kamp
dfc565190d
Progress indication during SDB finalization
2020-02-12 17:30:27 +01:00
Tilman Kamp
836b87f4a5
Removed debugging artifact
2020-02-12 14:46:13 +01:00
Tilman Kamp
bf9fcd15ad
Wave and dry-run support for SDB export
2020-02-12 13:55:49 +01:00
Tilman Kamp
bb51dbf193
Updated SDB support
2020-02-12 11:48:08 +01:00
Tilman Kamp
a8aa0ae2d5
Fix: Initializing Opus frame remainders with zeros
2020-02-10 11:43:32 +01:00
Tilman Kamp
c3e879a642
SDB support
2020-02-07 18:29:30 +01:00
Tilman Kamp
fba9c8971d
Light refactoring to integrate DeepSpeech audio helpers. Additional VAD parameters.
2020-02-06 17:06:44 +01:00
Ryan Hileman
d17489c29e
Fix KenLM model generation on smaller texts
...
Add --discount-fallback argument to lmplz, which was necessary when generating a language model for a smaller transcription. This option enables a fallback, and shouldn't affect anything except KenLM models that would've otherwise errored out.
2020-01-31 16:24:11 +01:00
Tilman Kamp
9016b2b5a9
Merge pull request #23 from lunixbochs/patch-3
...
Skip inputs with empty clean transcripts
2020-01-31 16:21:13 +01:00
Ryan Hileman
86779514ec
Skip inputs with empty clean transcripts
...
Fix #22
2020-01-31 07:12:16 -08:00
Tilman Kamp
a5f2808853
Merge pull request #17 from lunixbochs/patch-1
...
Fix align.sh example in README
2020-01-22 10:50:16 +01:00
Ryan Hileman
c53411ead5
Fix align.sh example in README
...
Looks like the existing example didn't work anymore, this modified command seems to be working for me.
2020-01-22 00:36:52 -08:00
Tilman Kamp
6dcfd4dd4a
Updating LM dependencies to DS 0.6.0
2019-12-20 17:33:16 +01:00
Tilman Kamp
699cacd286
Merge pull request #15 from BoneGoat/feature/deepspeech-0.6.0
...
Feature/deepspeech 0.6.0
2019-12-20 11:13:44 +01:00
Tobias
1963789ce8
Downloading alphabet.txt for DeepSpeech
2019-12-19 19:17:56 +01:00
Bias
420c68e58a
Set requirements for DeepSpeech 0.6.0
2019-12-19 16:58:12 +01:00
Bias
0cfee09964
Changes needed for DeepSpeech 0.6.0
2019-12-19 16:50:26 +01:00
Tilman Kamp
91f39a944a
Fix: Using file-size instead of sample number on list export
2019-12-02 11:26:33 +01:00
Tilman Kamp
ac9ca29507
Closing temp file system-handles; fewer conversion workers
2019-11-28 18:13:24 +01:00
Tilman Kamp
665c44cb18
Export using SoX
2019-11-28 15:08:19 +01:00
Tilman Kamp
08b14658fb
Reducing memory consumption during tar export
2019-11-28 12:07:44 +01:00
Tilman Kamp
1e0762feac
Tar file export
2019-11-27 17:40:09 +01:00
Tilman Kamp
7365e3fc2f
Alternative test catalog
2019-11-27 17:38:48 +01:00
Tilman Kamp
a7a69a09bd
Fix #4
2019-11-26 17:43:26 +01:00
Tilman Kamp
cc45c0d97a
Support for meta data in transcription logs
2019-11-26 17:39:17 +01:00
Tilman Kamp
b7e13b1896
Log to stdout by default
2019-11-25 15:43:54 +01:00
Tilman Kamp
213e954caa
Using imap_unordered for parallel alignment
2019-11-25 15:29:03 +01:00
Tilman Kamp
d95b90b2c2
Better logging without progress bar
2019-11-25 14:13:38 +01:00
Tilman Kamp
9ba45ca4e6
Lazy loading models
2019-11-25 13:35:03 +01:00
Tilman Kamp
f335419bc5
README update
2019-09-26 18:44:46 +02:00
Tilman Kamp
a138d55d9f
Always adding text fragments to .aligned file entries
2019-09-24 18:27:17 +02:00
Tilman Kamp
9070b867ad
Support for catalogs, multiple audio formats and parallel processing
2019-09-24 16:57:51 +02:00
Tilman Kamp
5b6057af61
CSV headers, assert on sample times
2019-09-23 15:25:07 +02:00
Tilman Kamp
db8e577e57
Print set durations
2019-09-18 15:52:33 +02:00
Tilman Kamp
bec42d6c14
Better command-line help
2019-09-18 15:10:02 +02:00
Tilman Kamp
ab3c66a6ba
Added statistics tool
2019-09-18 14:10:37 +02:00
Tilman Kamp
36d3672a19
Better set splitting
2019-09-17 15:38:36 +02:00
Tilman Kamp
72b8cc45f4
Fix for split field case
2019-09-17 10:30:19 +02:00
Tilman Kamp
eba1385523
Pretty printing JSON output
2019-09-16 18:30:27 +02:00
Tilman Kamp
cb695226fc
Exporter to DeepSpeech CSV files
2019-09-16 18:12:54 +02:00
Tilman Kamp
a08b1e85d9
Fixed re-definition of skip
2019-09-12 10:21:46 +02:00
Tilman Kamp
4efdc7b365
Dealing with empty matches
2019-09-11 18:12:10 +02:00
Tilman Kamp
ae584a901b
No phrase snapping by default
2019-09-11 17:43:40 +02:00
Tilman Kamp
d2530ed745
Shrinking support during fine-alignment
2019-09-11 16:21:42 +02:00