Bias
5cfa401ff1
Making sure files are closed
2020-07-03 16:21:11 +02:00
Bias
9c09da471a
Fixed unicode in output
2020-07-03 16:21:11 +02:00
Bias
05c96731c4
Removing temp lm files
2020-07-03 16:21:11 +02:00
Bias
f6cf144382
Initial support for DeepSpeech 0.7.1
2020-07-03 16:21:11 +02:00
Tilman Kamp
39a633a434
Updated documentation and minor tool fixes
2020-07-01 17:59:46 +02:00
Tilman Kamp
f6a16d92a0
Fix empty set-assignments
2020-05-04 18:18:52 +02:00
Tilman Kamp
f3e594f566
Better progress logging in SDB tool
2020-03-09 12:08:53 +01:00
Tilman Kamp
e06a702756
Fix missing utf-8 decoding on SDB meta data reading
2020-03-05 15:37:33 +01:00
Tilman Kamp
ebb2e9721f
Simplified main routine
2020-02-27 16:05:19 +01:00
Tilman Kamp
3dac0f8db3
Progress reporting on meta file writing; filenames in log messages
2020-02-27 16:03:59 +01:00
Tilman Kamp
149805990e
Closing od SDB reader at end of SortingSDBWriter finalization
2020-02-27 16:02:26 +01:00
Tilman Kamp
d41c0b1ff7
Exporter: Second chance conversion and ability to skip samples on audio errors
2020-02-26 11:50:52 +01:00
Tilman Kamp
f8cd176b8d
Post-refactoring fix of exporter's de-biasing
2020-02-25 18:53:40 +01:00
Tilman Kamp
0c2ea1b983
Export plan as a cache for export preparation steps
2020-02-25 16:54:53 +01:00
Tilman Kamp
e03e830685
Better split-field checking
2020-02-25 16:10:41 +01:00
Tilman Kamp
ecd2a74906
Better checking for existing target paths
2020-02-25 16:09:41 +01:00
Tilman Kamp
8897bb6cc7
Refactored exporter for better maintenance and lower memory footprint; --tmp-dir option; CSV meta files
2020-02-25 15:44:37 +01:00
Tilman Kamp
536dc6b006
Exporter argument parsing as own function
2020-02-24 11:19:59 +01:00
Tilman Kamp
23a4569ba5
Using heapq.merge for interleaving
2020-02-21 19:08:37 +01:00
Tilman Kamp
061f77bb62
Updated custom mime-types
2020-02-21 19:07:45 +01:00
Tilman Kamp
3a05b79285
Removed tqdm from stats.py
2020-02-21 13:10:28 +01:00
Tilman Kamp
e788cee6dc
Fix #25
2020-02-21 13:02:35 +01:00
Tilman Kamp
e8fb5895ca
Additional parameter for SDB finalization sample buffer size
2020-02-20 17:16:55 +01:00
Tilman Kamp
0d45123b29
Progress logging: Prevent division by 0
2020-02-20 17:16:03 +01:00
Tilman Kamp
842d50a950
Fix for meta fields that are single values instead of lists
2020-02-20 13:57:05 +01:00
Tilman Kamp
4e296d4011
Changed some debug log-messages to info ones
2020-02-20 11:44:09 +01:00
Tilman Kamp
7993f214fd
Progress logging to stderr
2020-02-19 17:55:43 +01:00
Tilman Kamp
82d40d82f4
Better progress logging; Output of fragment meta data on sample cutting problem
2020-02-19 17:29:10 +01:00
Tilman Kamp
02305706d0
Remove incompatible stty sane call
2020-02-18 17:28:11 +01:00
Tilman Kamp
f7e2f7f0ab
Refactored CollectionSample to LabeledSample
2020-02-18 15:53:28 +01:00
Tilman Kamp
fe8588565f
Better progress logging in catalog tool
2020-02-18 14:08:27 +01:00
Tilman Kamp
7fa6773bc4
Fixes for logging and argument parsing
2020-02-18 14:07:08 +01:00
Tilman Kamp
28806dfff6
Fix #16
2020-02-18 14:05:53 +01:00
Tilman Kamp
b4a0e3bd85
Catalog tool
2020-02-17 18:27:41 +01:00
Tilman Kamp
06ab3a2d17
Meta data output independent of sample output format
2020-02-17 13:47:11 +01:00
Tilman Kamp
1383951191
Fix --no-progress option; better info logs
2020-02-17 12:34:56 +01:00
Tilman Kamp
aacbda1676
Ability to drop samples with unknown and/or multi-instance meta data
2020-02-17 11:41:55 +01:00
Tilman Kamp
3342067a0b
Meta data output on SDB export
2020-02-14 18:29:50 +01:00
Tilman Kamp
42556e1988
Late binding of opuslib, nicer progress bars on export
2020-02-13 12:21:25 +01:00
Tilman Kamp
dfc565190d
Progress indication during SDB finalization
2020-02-12 17:30:27 +01:00
Tilman Kamp
836b87f4a5
Removed debugging artifact
2020-02-12 14:46:13 +01:00
Tilman Kamp
bf9fcd15ad
Wave and dry-run support for SDB export
2020-02-12 13:55:49 +01:00
Tilman Kamp
bb51dbf193
Updated SDB support
2020-02-12 11:48:08 +01:00
Tilman Kamp
a8aa0ae2d5
Fix: Initializing Opus frame remainders with zeros
2020-02-10 11:43:32 +01:00
Tilman Kamp
c3e879a642
SDB support
2020-02-07 18:29:30 +01:00
Tilman Kamp
fba9c8971d
Light refactoring to integrate DeepSpeech audio helpers. Additional VAD parameters.
2020-02-06 17:06:44 +01:00
Ryan Hileman
d17489c29e
Fix KenLM model generation on smaller texts
...
Add --discount-fallback argument to lmplz, which was necessary when generating a language model for a smaller transcription. This option enables a fallback, and shouldn't affect anything except KenLM models that would've otherwise errored out.
2020-01-31 16:24:11 +01:00
Tilman Kamp
9016b2b5a9
Merge pull request #23 from lunixbochs/patch-3
...
Skip inputs with empty clean transcripts
2020-01-31 16:21:13 +01:00
Ryan Hileman
86779514ec
Skip inputs with empty clean transcripts
...
Fix #22
2020-01-31 07:12:16 -08:00
Tilman Kamp
a5f2808853
Merge pull request #17 from lunixbochs/patch-1
...
Fix align.sh example in README
2020-01-22 10:50:16 +01:00