Add batch processing example (#58)

2020-06-17 06:52:51 -04:00 · 2020-06-17 06:52:51 -04:00 · 25be62ab21
--- a/batch_processing/Readme.md
+++ b/batch_processing/Readme.md
@ -0,0 +1,157 @@
+
+# Setup for windows
+
+The setup of the windows versions was quite a challenge, getting the versions right.
+
+see `setup.ps` for the loading of the paths.
+
+```
+Windows 10 Pro
+Version: 2004
+Os Build : 19041.329
+```
+
+Here are the versions I have installed via chocolaty
+
+```
+Chocolatey v0.10.15
+7zip v19.0 
+7zip.install v19.0
+anaconda3 v2020.02
+audacity v2.4.1
+az.powershell v4.2.0
+azshell v0.2.2
+azure-cli v2.7.0
+azurepowershell v6.9.0
+bazel v3.2.0
+blender v2.83.0
+chocolatey v0.10.15
+chocolatey-core.extension v1.3.5.1
+chocolatey-dotnetfx.extension v1.0.1
+chocolatey-fastanswers.extension v0.0.2
+chocolatey-visualstudio.extension v1.8.1
+chocolatey-windowsupdate.extension v1.0.4
+docker-desktop v2.3.0.3 
+DotNet4.5.1 v4.5.1.20140606 
+DotNet4.5.2 v4.5.2.20140902 
+dotnetfx v4.8.0.20190930 
+ffmpeg v4.2.3 
+git v2.26.2
+git.install
+google-chrome-x64 v47.0.2526.81 
+GoogleChrome v83.0.4103.97 
+grep v2.1032 
+KB2919355 v1.0.20160915 
+KB2919442 v1.0.20160915 
+KB2999226 v1.0.20181019 
+KB3033929 v1.0.5 
+KB3035131 v1.0.3 
+KB3118401 v1.0.4 
+microsoft-windows-terminal v1.0.1401.0 
+msys2 v20200602.0.0 
+NTop.Portable v0.3.4 
+powershell-core v7.0.1
+procexp v16.32 
+python v3.8.3 
+python3 v3.8.3 
+sox.portable v14.4.1 
+vcredist140 v14.26.28720.3 
+vcredist2008 v9.0.30729.6163 
+vcredist2015 v14.0.24215.20170201 
+vcredist2017 v14.16.27033 
+visualstudio-installer v2.0.1 
+VisualStudio2013ExpressWeb v12.0.21005.20150920 
+visualstudio2019community v16.6.1.0 
+vscode v1.45.1
+vscode.install v1.45.1
+Wget v1.20.3.20190531 
+windows-sdk-10-version-2004-windbg v10.0.19041.0 
+wsl v1.0.1 
+Xming v6.9.0.31 
+
+```
+
+The versions of software installed are :
+
+* CUDA v10.0
+from https://developer.nvidia.com/cuda-10.0-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exelocal
+get the file https://developer.download.nvidia.com/compute/cuda/10.0/secure/Prod/local_installers/cuda_10.0.130_411.31_win10.exe
+
+* cudnn-10.0-windows10-x64-v7.5.1.10
+from https://developer.nvidia.com/rdp/cudnn-archive
+`Download cuDNN v7.5.1 (April 22, 2019), for CUDA 10.0`
+via https://developer.nvidia.com/rdp/cudnn-archive#a-collapse751-10
+get the file https://developer.download.nvidia.com/compute/machine-learning/cudnn/secure/7.6.5.32/Production/10.0_20191031/cudnn-10.0-windows10-x64-v7.6.5.32.zip
+
+* TensorRT-5.1.5.0.Windows10.x86_64.cuda-10.0.cudnn7.5
+from https://developer.nvidia.com/nvidia-tensorrt-5x-download
+https://developer.nvidia.com/nvidia-tensorrt-5x-download#trt51ga
+via `Windows10 and CUDA 10.0 zip package`
+get the file https://developer.nvidia.com/compute/machine-learning/tensorrt/5.1/ga/zips/TensorRT-5.1.5.0.Windows10.x86_64.cuda-10.0.cudnn7.5.zip
+
+
+I am using these exact versions:
+`pip3 install -r requirements.txt`
+
+Here is the output :
+```
+PS C:\Users\jmike\Documents\GitHub\DeepSpeech-examples\batch_processing> . .\test.ps1
+2020-06-14 11:05:01.015450: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
+Loading model from file C:\Users\jmike\Documents\GitHub\DeepSpeech\deepspeech-0.7.3-models.pbmm
+TensorFlow: v1.15.0-24-gceb46aae58
+DeepSpeech: v0.7.3-0-g88584941
+2020-06-14 11:05:01.237478: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
+2020-06-14 11:05:01.244057: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
+2020-06-14 11:05:01.466608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
+name: GeForce MX250 major: 6 minor: 1 memoryClockRate(GHz): 1.582
+pciBusID: 0000:01:00.0
+2020-06-14 11:05:01.466806: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
+2020-06-14 11:05:01.473468: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
+2020-06-14 11:05:01.476879: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
+2020-06-14 11:05:01.478672: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
+2020-06-14 11:05:01.482925: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
+2020-06-14 11:05:01.485963: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
+2020-06-14 11:05:01.498053: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
+2020-06-14 11:05:01.498710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
+2020-06-14 11:05:02.066853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
+2020-06-14 11:05:02.067030: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
+2020-06-14 11:05:02.068133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
+2020-06-14 11:05:02.073298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1410 MB memory) -> physical GPU (device: 0, name: GeForce MX250, pci bus id: 0000:01:00.0, compute capability: 6.1)
+Loaded model in 0.941s.
+Loading scorer from files C:\Users\jmike\Documents\GitHub\DeepSpeech\deepspeech-0.7.3-models.scorer
+Loaded scorer in 0.0143s.
+Warning: original sample rate (44100) is different than 16000hz. Resampling might produce erratic speech recognition.
+Running inference.
+2020-06-14 11:05:02.382781: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
+```
+Running via the GPU takes half the time of using the CPU and has good results.
+
+# Driver command line
+
+`./driver.py --model c:/Users/jmike/Documents/GitHub/DeepSpeech/deepspeech-0.7.3-models.pbmm  --scorer c:/Users/jmike/Documents/GitHub/DeepSpeech/deepspeech-0.7.3-models.scorer --dirname c:/Users/jmike/Downloads/podcast/`
+
+# Example
+
+It will then run the individual commands like :
+
+`deepspeech --model C:\Users\jmike\Documents\GitHub\DeepSpeech\deepspeech-0.7.3-models.pbmm --scorer C:\Users\jmike\Documents\GitHub\DeepSpeech\deepspeech-0.7.3-models.scorer --audio 'C:\Users\jmike\Downloads\podcast\45374977-48000-2-24d9a365625bb.mp3.wav' --json`
+
+
+Websites referenced:
+
+https://chocolatey.org/packages/cuda
+https://deepspeech.readthedocs.io/en/v0.7.3/?badge=latest
+https://developer.nvidia.com/cuda-10.0-download-archive?target_os=Windows&target_arch=x86_64&target_version=10
+https://discourse.mozilla.org/t/query-regarding-speed-of-training-and-issues-with-convergence/41874
+https://discourse.mozilla.org/t/right-cuda-version-for-using-deepspeech-gpu/41927/12
+https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#download-windows
+https://github.com/MichalMazurek/python-poetry/blob/d3f6df6a6c2587d7a6034719716de257917c4b0f/dockerfiles.py
+https://github.com/amitt001/delegator.py
+https://github.com/tensorflow/tensorflow/issues/25807
+https://github.com/tensorflow/tensorflow/issues/28223
+https://github.com/tensorflow/tensorflow/issues/5968
+https://hacks.mozilla.org/2019/12/deepspeech-0-6-mozillas-speech-to-text-engine/
+https://palletsprojects.com/p/click/
+https://www.howtoforge.com/tutorial/ffmpeg-audio-conversion/
+https://www.joe0.com/2019/10/19/how-resolve-tensorflow-2-0-error-could-not-load-dynamic-library-cudart64_100-dll-dlerror-cudart64_100-dll-not-found/
+https://www.programcreek.com/python/example/88033/click.Path
--- a/batch_processing/driver.py
+++ b/batch_processing/driver.py
@ -0,0 +1,83 @@
+import glob
+import json
+import os
+from os.path import expanduser
+
+import click
+
+import delegator
+
+# first loop over the files
+# convert them to wave
+
+# record things in 16000hz in the future or you gret this
+# Warning: original sample rate (44100) is different than 16000h.z Resampling might produce erratic speech recognition.
+
+
+@click.command()
+@click.option("--dirname", type=click.Path(exists=True, resolve_path=True))
+@click.option("--ext", default=".mp3")
+@click.option(
+    "--model",
+    default="deepspeech-0.7.3-models.pbmm",
+    type=click.Path(exists=True, resolve_path=True),
+)
+@click.option(
+    "--scorer",
+    default="deepspeech-0.7.3-models.scorer",
+    type=click.Path(exists=True, resolve_path=True),
+)
+
+# manage my library of podcasts
+def main(dirname, ext, model, scorer):
+    print("main")
+    model = expanduser(model)
+    scorer = expanduser(scorer)
+    pattern = dirname + "/" + "*" + ext
+    audiorate = "16000"
+
+    print(pattern)
+    for filename in glob.glob(pattern):
+        print(filename)
+
+        wavefile = filename + ".wav"
+
+        convert_command = " ".join(
+            [
+                "ffmpeg",
+                "-i",
+                "'{}'".format(filename),
+                "-ar",
+                audiorate,
+                "'{}'".format(wavefile),
+            ]
+        )
+        if not os.path.isfile(wavefile):
+            print(convert_command)
+            r = delegator.run(convert_command)
+            print(r.out)
+        else:
+            print("skipping wave conversion that exists")
+
+        command = " ".join(
+            [
+                "deepspeech",
+                "--model",
+                model,
+                "--scorer",
+                scorer,
+                "--audio",
+                "'{}'".format(wavefile),
+                #            "--extended",
+                "--json",
+            ]
+        )
+        print(command)
+        r = delegator.run(command)
+        with open(filename + ".json", "w") as fo:
+            print(r.out)
+            fo.write(r.out)
+
+
+if __name__ == "__main__":
+    main()
--- a/batch_processing/requirements.txt
+++ b/batch_processing/requirements.txt
@ -0,0 +1,62 @@
+absl-py==0.9.0
+addignore==1.2.7
+appdirs==1.4.4
+astor==0.8.1
+astunparse==1.6.3
+attrs==19.3.0
+black==19.10b0
+bokeh==1.4.0
+cachetools==4.1.0
+certifi==2020.4.5.2
+chardet==3.0.4
+click==7.1.2
+deepspeech==0.7.3
+deepspeech-gpu==0.7.3
+delegator.py @ git+https://github.com/amitt001/delegator.py.git@194aa92543fbdbfbae0bcc24ca217819a7805da2
+flask==1.1.2
+gast==0.2.2
+google-auth==1.16.1
+google-auth-oauthlib==0.4.1
+google-pasta==0.2.0
+grpcio==1.29.0
+h5py==2.10.0
+idna==2.9
+isort==4.3.21
+Jinja2==2.11.2
+Keras-Applications==1.0.8
+Keras-Preprocessing==1.1.2
+Markdown==3.2.2
+MarkupSafe==1.1.1
+numpy==1.17.3
+oauthlib==3.1.0
+opt-einsum==3.2.1
+packaging==20.4
+pathspec==0.8.0
+pexpect==4.8.0
+phonemizer==2.2
+protobuf==3.12.2
+ptyprocess==0.6.0
+pyasn1==0.4.8
+pyasn1-modules==0.2.8
+pyparsing==2.4.7
+PyYAML==5.3.1
+regex==2020.6.7
+requests==2.23.0
+requests-oauthlib==1.3.0
+rsa==4.0
+scipy==1.4.1
+six==1.15.0
+soundfile==0.10.3.post1
+tensorboard==2.1.1
+tensorboard-plugin-wit==1.6.0.post3
+tensorflow-estimator==2.1.0
+tensorflow-gpu==2.2.0
+tensorflow-gpu-estimator==2.2.0
+termcolor==1.1.0
+toml==0.10.1
+tqdm==4.46.1
+tts==0.0.2+f320992
+typed-ast==1.4.1
+urllib3==1.25.9
+Werkzeug==1.0.1
+wrapt==1.12.1
--- a/batch_processing/setup.ps1
+++ b/batch_processing/setup.ps1
@ -0,0 +1,5 @@
+$env:Path += ";C:\Users\jmike\Downloads\cudnn-10.0-windows10-x64-v7.5.1.10\cuda\bin"
+$env:Path += ";$env:userprofile\Downloads\TensorRT-5.1.5.0.Windows10.x86_64.cuda-10.0.cudnn7.5\TensorRT-5.1.5.0\lib"
+$env:Path += ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin"
+$env:Path += ";c:\tools\msys64\usr\bin\"
+$env:Path += ";C:\Program Files (x86)\Dr. Memory\bin\"
--- a/batch_processing/test.ps1
+++ b/batch_processing/test.ps1
@ -0,0 +1 @@
+deepspeech --model C:\Users\jmike\Documents\GitHub\DeepSpeech\deepspeech-0.7.3-models.pbmm --scorer C:\Users\jmike\Documents\GitHub\DeepSpeech\deepspeech-0.7.3-models.scorer --audio C:\Users\jmike\Documents\Audacity\clip.wav --json
--- a/batch_processing/test_tf.py
+++ b/batch_processing/test_tf.py
@ -0,0 +1,10 @@
+import tensorflow as tf
+
+print ("hello")
+av = tf.test.is_gpu_available()
+print(av)
+
+av2= tf.config.list_physical_devices('GPU')
+print(av2)
+
+#[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
				`@ -0,0 +1 @@`
				`deepspeech --model C:\Users\jmike\Documents\GitHub\DeepSpeech\deepspeech-0.7.3-models.pbmm --scorer C:\Users\jmike\Documents\GitHub\DeepSpeech\deepspeech-0.7.3-models.scorer --audio C:\Users\jmike\Documents\Audacity\clip.wav --json`