Tilman Kamp
c3e879a642
SDB support
2020-02-07 18:29:30 +01:00
Tilman Kamp
fba9c8971d
Light refactoring to integrate DeepSpeech audio helpers. Additional VAD parameters.
2020-02-06 17:06:44 +01:00
Ryan Hileman
d17489c29e
Fix KenLM model generation on smaller texts
...
Add --discount-fallback argument to lmplz, which was necessary when generating a language model for a smaller transcription. This option enables a fallback, and shouldn't affect anything except KenLM models that would've otherwise errored out.
2020-01-31 16:24:11 +01:00
Tilman Kamp
9016b2b5a9
Merge pull request #23 from lunixbochs/patch-3
...
Skip inputs with empty clean transcripts
2020-01-31 16:21:13 +01:00
Ryan Hileman
86779514ec
Skip inputs with empty clean transcripts
...
Fix #22
2020-01-31 07:12:16 -08:00
Tilman Kamp
a5f2808853
Merge pull request #17 from lunixbochs/patch-1
...
Fix align.sh example in README
2020-01-22 10:50:16 +01:00
Ryan Hileman
c53411ead5
Fix align.sh example in README
...
Looks like the existing example didn't work anymore, this modified command seems to be working for me.
2020-01-22 00:36:52 -08:00
Tilman Kamp
6dcfd4dd4a
Updating LM dependencies to DS 0.6.0
2019-12-20 17:33:16 +01:00
Tilman Kamp
699cacd286
Merge pull request #15 from BoneGoat/feature/deepspeech-0.6.0
...
Feature/deepspeech 0.6.0
2019-12-20 11:13:44 +01:00
Tobias
1963789ce8
Downloading alphabet.txt for DeepSpeech
2019-12-19 19:17:56 +01:00
Bias
420c68e58a
Set requirements for DeepSpeech 0.6.0
2019-12-19 16:58:12 +01:00
Bias
0cfee09964
Changes needed for DeepSpeech 0.6.0
2019-12-19 16:50:26 +01:00
Tilman Kamp
91f39a944a
Fix: Using file-size instead of sample number on list export
2019-12-02 11:26:33 +01:00
Tilman Kamp
ac9ca29507
Closing temp file system-handles; fewer conversion workers
2019-11-28 18:13:24 +01:00
Tilman Kamp
665c44cb18
Export using SoX
2019-11-28 15:08:19 +01:00
Tilman Kamp
08b14658fb
Reducing memory consumption during tar export
2019-11-28 12:07:44 +01:00
Tilman Kamp
1e0762feac
Tar file export
2019-11-27 17:40:09 +01:00
Tilman Kamp
7365e3fc2f
Alternative test catalog
2019-11-27 17:38:48 +01:00
Tilman Kamp
a7a69a09bd
Fix #4
2019-11-26 17:43:26 +01:00
Tilman Kamp
cc45c0d97a
Support for meta data in transcription logs
2019-11-26 17:39:17 +01:00
Tilman Kamp
b7e13b1896
Log to stdout by default
2019-11-25 15:43:54 +01:00
Tilman Kamp
213e954caa
Using imap_unordered for parallel alignment
2019-11-25 15:29:03 +01:00
Tilman Kamp
d95b90b2c2
Better logging without progress bar
2019-11-25 14:13:38 +01:00
Tilman Kamp
9ba45ca4e6
Lazy loading models
2019-11-25 13:35:03 +01:00
Tilman Kamp
f335419bc5
README update
2019-09-26 18:44:46 +02:00
Tilman Kamp
a138d55d9f
Always adding text fragments to .aligned file entries
2019-09-24 18:27:17 +02:00
Tilman Kamp
9070b867ad
Support for catalogs, multiple audio formats and parallel processing
2019-09-24 16:57:51 +02:00
Tilman Kamp
5b6057af61
CSV headers, assert on sample times
2019-09-23 15:25:07 +02:00
Tilman Kamp
db8e577e57
Print set durations
2019-09-18 15:52:33 +02:00
Tilman Kamp
bec42d6c14
Better command-line help
2019-09-18 15:10:02 +02:00
Tilman Kamp
ab3c66a6ba
Added statistics tool
2019-09-18 14:10:37 +02:00
Tilman Kamp
36d3672a19
Better set splitting
2019-09-17 15:38:36 +02:00
Tilman Kamp
72b8cc45f4
Fix for split field case
2019-09-17 10:30:19 +02:00
Tilman Kamp
eba1385523
Pretty printing JSON output
2019-09-16 18:30:27 +02:00
Tilman Kamp
cb695226fc
Exporter to DeepSpeech CSV files
2019-09-16 18:12:54 +02:00
Tilman Kamp
a08b1e85d9
Fixed re-definition of skip
2019-09-12 10:21:46 +02:00
Tilman Kamp
4efdc7b365
Dealing with empty matches
2019-09-11 18:12:10 +02:00
Tilman Kamp
ae584a901b
No phrase snapping by default
2019-09-11 17:43:40 +02:00
Tilman Kamp
d2530ed745
Shrinking support during fine-alignment
2019-09-11 16:21:42 +02:00
Tilman Kamp
92f23253bb
Support for different text similarity metrics
2019-09-10 18:39:03 +02:00
Tilman Kamp
11c61c7970
Default minimum of 1 for tlen and mlen
2019-09-06 17:52:29 +02:00
Tilman Kamp
0aa5bc7422
Meta data pass through
2019-09-05 16:27:24 +02:00
Tilman Kamp
fab9ec601f
Extended example data for new script support
2019-09-05 15:33:10 +02:00
Tilman Kamp
138026d875
Fix for multi-phrase support, direct loading of .tlog files
2019-09-04 18:45:34 +02:00
Tilman Kamp
718a430e59
Bug fixes; progress-bars; multi phrase text cleaning preserving assigned meta-data
2019-09-04 15:45:58 +02:00
Tilman Kamp
b7c1e9ea7b
Removed debug leftover
2019-09-02 15:23:29 +02:00
Tilman Kamp
b912fef392
README updates
2019-09-02 13:42:04 +02:00
Tilman Kamp
03f5b39d37
Multi-process transcription, .tlog files for transcripts, some light refactoring and reduced logging
2019-09-02 12:26:10 +02:00
Tilman Kamp
3031f35561
Fix wrong index name
2019-08-20 10:25:07 +02:00
Tilman Kamp
cc6507ef97
Fix for calling removed function
2019-08-20 10:19:09 +02:00