Граф коммитов

87 Коммитов

Автор SHA1 Сообщение Дата
Francis Tyers e026bb773b
Update augmentations.py
Looks like `clock_to` got changed to `final_clock` but this one was missed.
2021-05-13 16:12:52 +01:00
Alexandre Lissy bde1ebc842 Fix #3608: Remove code refs to TaskCluster 2021-04-08 16:28:49 +02:00
CatalinVoss 6279b59875 Fix documentation for check_characters.py script 2021-04-03 17:45:45 -07:00
Kathy Reid b2bac3c5e6
Replace remove_remote() method with remove method
Partially resolves #3569
2021-03-24 13:22:11 +11:00
CatalinVoss b0db7b6f8f Handle mono conversion within `pcm_to_np()` 2021-03-19 13:01:50 -07:00
CatalinVoss f31ce5ca48 Don't throw on mono audio any more since everything should work? 2021-03-19 13:01:47 -07:00
NanoNabla 11edd92775 implement distributed training using horovod 2021-02-16 12:41:38 +01:00
CatalinVoss f27908e7e3 Fix copying remote AudioFile target to local 2021-01-26 10:02:59 +00:00
Reuben Morais 79a42b345d Read audio format from data before running augmentation passes instead of assuming default 2021-01-18 12:11:31 +00:00
Reuben Morais 8c0d46cb7f Normalize sample rate of train_files by default 2021-01-18 12:11:31 +00:00
Reuben Morais d4152f6e67 Add support for Ogg/Opus audio files for training 2021-01-18 12:11:31 +00:00
Catalin Voss 6640cf2341
Remote training I/O once more (#3437)
* Redo remote I/O changes once more; this time without messing with taskcluster

* Add bin changes

* Fix merge-induced issue?

* For the interleaved case with multiple collections, unpack audio on the fly

To reproduce the previous failure

rm data/smoke_test/ldc93s1.csv
rm data/smoke_test/ldc93s1.sdb
rm -rf /tmp/ldc93s1_cache_sdb_csv
rm -rf /tmp/ckpt_sdb_csv
rm -rf /tmp/train_sdb_csv

./bin/run-tc-ldc93s1_new_sdb_csv.sh 109 16000
python -u DeepSpeech.py --noshow_progressbar --noearly_stop --train_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --train_batch_size 1 --feature_cache /tmp/ldc93s1_cache_sdb_csv --dev_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --dev_batch_size 1 --test_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --test_batch_size 1 --n_hidden 100 --epochs 109 --max_to_keep 1 --checkpoint_dir /tmp/ckpt_sdb_csv --learning_rate 0.001 --dropout_rate 0.05 --export_dir /tmp/train_sdb_csv --scorer_path data/smoke_test/pruned_lm.scorer --audio_sample_rate 16000

* Attempt to preserve length information with a wrapper around `map()`… this gets pretty python-y

* Call the right `__next__()`

* Properly implement the rest of the map wrappers here……

* Fix trailing whitespace situation and other linter complaints

* Remove data accidentally checked in

* Fix overlay augmentations

* Wavs must be open in rb mode if we're passing in an external file pointer -- this confused me

* Lint whitespace

* Revert "Fix trailing whitespace situation and other linter complaints"

This reverts commit c3c45397a2.

* Fix linter issue but without such an aggressive diff

* Move unpack_maybe into sample_collections

* Use unpack_maybe in place of duplicate lambda

* Fix confusing comment

* Add clarifying comment for on-the-fly unpacking
2020-12-07 13:07:34 +01:00
Alexandre Lissy c822a6e875 Importer for dataset from Centre de Conférences Pierre Mendès-France
Released by Ministère de l'Economie, des Finances, et de la Relance
2020-11-24 09:49:39 +01:00
Reuben Morais 88f7297215 Revert "Merge pull request #3420 from CatalinVoss/remote-io"
This reverts commit 08d18d7328, reversing
changes made to 12badcce1f.
2020-11-19 16:58:21 +02:00
Reuben Morais f5cbda694a Revert "Merge pull request #3424 from mozilla/io-fixes"
This reverts commit ab1288ffde, reversing
changes made to 08d18d7328.
2020-11-19 16:58:01 +02:00
CatalinVoss 6cb638211e Only unpack when we need to, to make things work with SDBs 2020-11-17 16:55:49 -08:00
CatalinVoss 24e9e6777c Make sure we properly unpack samples when changing audio types 2020-11-17 14:44:26 -08:00
CatalinVoss ffe2155733 Undo remote edits for taskcluster as this is all local 2020-11-17 13:47:55 -08:00
CatalinVoss d0678cd1b7 Remove unused unordered imap from LimitPool 2020-11-16 13:47:21 -08:00
CatalinVoss 611633fcf6 Remove unnecessary uses of `open_remote()` where we know `__file__` will always be local 2020-11-16 13:47:06 -08:00
CatalinVoss b5b3b2546c Clean up remote I/O docs 2020-11-16 13:46:34 -08:00
CatalinVoss fb6d4ca361 Add disclaimers to CSV and Tar writers 2020-11-13 19:36:07 -08:00
CatalinVoss 8c1a183c67 Clean up print debugging statements 2020-11-13 19:24:09 -08:00
CatalinVoss 47020e4ecb Add an imap_unordered helper to LimitPool -- I might experiment with this 2020-11-13 19:20:02 -08:00
CatalinVoss 3d2b09b951 Linter seems unhappy with conditional imports. Make gfile a module-level import.
I usually do this as a conditional because tf takes a while to load and it's nice to skip it when you want to run a script that just preps data or something like that, but it doesn't seem like a big deal.
2020-11-13 10:47:06 -08:00
CatalinVoss 2332e7fb76 Linter fix: define self.tmp_src_file_path in init 2020-11-13 10:45:53 -08:00
CatalinVoss be39d3354d Perform data loading I/O within worker process rather than main process by wrapping Sample 2020-11-12 21:46:39 -08:00
CatalinVoss fc0b495643 TODO: CSVWriter still totally breaks with remote paths 2020-11-12 16:46:59 -08:00
CatalinVoss 86cba458c5 Fix remote path handling for CSV sample reading 2020-11-12 16:40:59 -08:00
CatalinVoss 8fe972eb6f Fix wave file reading helpers 2020-11-12 16:40:40 -08:00
CatalinVoss 783cdad8db Fix downloader and taskcluster directory mgmt with remote I/O 2020-11-12 16:30:11 -08:00
CatalinVoss 64d278560d Why do we need absolute paths everywhere here? 2020-11-12 16:29:43 -08:00
CatalinVoss 8f31072998 Fix startswith check 2020-11-12 15:09:42 -08:00
CatalinVoss 90e2e1f7d2 Respect buffering, encoding, newline, closefd, and opener if we're looking at a local file 2020-11-12 14:45:05 -08:00
CatalinVoss ad08830421 Work remote I/O into audio utils -- a bit more involved 2020-11-12 14:17:03 -08:00
CatalinVoss 3d503bd69e Add universal is_remote_path to I/O helper 2020-11-12 14:16:37 -08:00
CatalinVoss c3dc4c0d5c Fix bad I/O helper fn replace errors 2020-11-12 14:06:22 -08:00
CatalinVoss abe5dd2eb4 Remote I/O for taskcluster 2020-11-12 12:49:44 -08:00
CatalinVoss 296b74e01a Remote I/O for sample_collections 2020-11-12 10:54:44 -08:00
CatalinVoss 7de317cf59 Remote I/O for evaluate_tools 2020-11-12 10:49:33 -08:00
CatalinVoss 396ac7fe46 Remote I/O for downloader 2020-11-12 10:48:49 -08:00
CatalinVoss 933d96dc74 Fix relative imports 2020-11-12 10:47:26 -08:00
CatalinVoss 42170a57eb Remote I/O for config 2020-11-12 10:46:49 -08:00
CatalinVoss 83e5cf0416 Remote I/O fro check_characters 2020-11-12 10:46:15 -08:00
CatalinVoss 53e3f5374f Add I/O helpers for remote file access 2020-11-12 10:44:19 -08:00
Reuben Morais 83a36b7a34 Rename --utf8 flag to --bytes_output_mode to avoid confusion 2020-10-06 18:19:33 +02:00
josh meyer afee570f3c mono-channel error, not just an assertion
X-DeepSpeech: NOBUILD
2020-10-02 13:27:43 -07:00
Daniel 93a4de5489 Fix lr initialization on reload. 2020-08-27 15:08:32 +02:00
Reuben Morais ae0cf8db6a Revert "Merge branch 'rename-real'"
This reverts commit ae9fdb183e, reversing
changes made to 2eb75b6206.
2020-08-26 11:46:09 +02:00
Reuben Morais da55cfae86 Revert "Merge pull request #3237 from lissyx/rename-training-package"
This reverts commit 3dcb3743ac, reversing
changes made to 457198c88d.
2020-08-26 11:46:08 +02:00