DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Перейти к файлу
Reuben Morais 271e3639a7
Merge pull request #2555 from mozilla/ubuntu-16.04
Remove ubuntu-advantage-tools package to workaround ESM repository 401 problem
2019-11-29 07:33:14 +00:00
.github Add lock bot config 2018-12-28 19:37:01 -02:00
bin Support packaging as Zip file 2019-11-13 11:16:11 +01:00
data Switch to --prune 0 0 1 model and move generation code to a script 2019-11-15 13:28:45 +01:00
doc Update Python example line numbers 2019-11-14 11:13:42 +01:00
examples Link to proper README for examples/ 2019-11-27 13:44:27 +01:00
images Compressed gif 2017-11-28 17:16:03 +01:00
native_client Fixed store_true in --json 2019-11-24 20:31:24 +05:30
taskcluster Remove ubuntu-advantage-tools package to workaround ESM repository availability 2019-11-28 22:25:54 +01:00
util Separate process per file; less log noise 2019-11-20 17:29:13 +01:00
.cardboardlint.yml Update cardboardlint configuration 2019-10-04 13:56:41 +02:00
.compute Bump references to TF 1.13.1 to TF 1.14.0 2019-07-08 18:56:59 +02:00
.gitattributes Remove old versions of decoder binary files 2018-11-08 18:35:42 -02:00
.gitignore Sphinx doc 2019-09-24 18:22:45 +02:00
.pylintrc Remove alphabet param usage 2019-11-05 09:02:42 +01:00
.readthedocs.yml Re-enable readthedocs.io 2019-09-24 10:55:26 +02:00
.taskcluster.yml Move to TC Community 2019-11-05 07:42:39 +01:00
.travis.yml Add pylint CI 2019-04-10 21:21:26 -03:00
CODE_OF_CONDUCT.md Add Mozilla Code of Conduct file 2019-03-29 14:58:39 -07:00
CONTRIBUTING.rst Move from Markdown to reStructuredText 2019-10-04 12:07:32 +02:00
DeepSpeech.py Fix: Added executable flag to DeepSpeech.py again 2019-11-21 13:08:53 +01:00
Dockerfile Update Dockerfile 2019-10-23 13:31:31 +11:00
GRAPH_VERSION Embed alphabet directly in model 2019-11-05 09:02:21 +01:00
ISSUE_TEMPLATE.md
LICENSE
README.rst Link to proper README for examples/ 2019-11-27 13:44:27 +01:00
RELEASE.rst Move from Markdown to reStructuredText 2019-10-04 12:07:32 +02:00
SUPPORT.rst Move from Markdown to reStructuredText 2019-10-04 12:07:32 +02:00
TRAINING.rst adding amp doc 2019-10-27 23:29:05 +00:00
USING.rst Remove alphabet param usage 2019-11-05 09:02:42 +01:00
VERSION Bump VERSION to 0.6.0-alpha.15 2019-11-14 09:18:30 +01:00
bazel.patch Proper re-use of Bazel cache 2018-01-31 18:50:36 +01:00
build-python-wheel.yml-DISABLED_ENABLE_ME_TO_REBUILD_DURING_PR Move to ARMbian Buster 2019-08-21 22:58:10 +02:00
evaluate.py Merge pull request #2435 from mozilla/uplift-utf8-fixes 2019-10-25 09:09:48 +00:00
evaluate_tflite.py Remove alphabet param usage 2019-11-05 09:02:42 +01:00
requirements.txt Tool for bulk transcription 2019-11-18 16:03:03 +01:00
requirements_eval_tflite.txt Add TFLite accuracy estimation tool 2019-02-12 13:03:20 +01:00
stats.py Computing audio hours at import 2019-05-28 16:46:20 +02:00
transcribe.py Separate process per file; less log noise 2019-11-20 17:29:13 +01:00

README.rst

Project DeepSpeech
==================


.. image:: https://readthedocs.org/projects/deepspeech/badge/?version=latest
   :target: http://deepspeech.readthedocs.io/?badge=latest
   :alt: Documentation


.. image:: https://community-tc.services.mozilla.com/api/github/v1/repository/mozilla/DeepSpeech/master/badge.svg
   :target: https://community-tc.services.mozilla.com/api/github/v1/repository/mozilla/DeepSpeech/master/latest
   :alt: Task Status


DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on `Baidu's Deep Speech research paper <https://arxiv.org/abs/1412.5567>`_. Project DeepSpeech uses Google's `TensorFlow <https://www.tensorflow.org/>`_ to make the implementation easier.

To install and use deepspeech all you have to do is:

.. code-block:: bash

   # Create and activate a virtualenv
   virtualenv -p python3 $HOME/tmp/deepspeech-venv/
   source $HOME/tmp/deepspeech-venv/bin/activate

   # Install DeepSpeech
   pip3 install deepspeech

   # Download pre-trained English model and extract
   curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.5.1/deepspeech-0.5.1-models.tar.gz
   tar xvf deepspeech-0.5.1-models.tar.gz

   # Download example audio files
   curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.5.1/audio-0.5.1.tar.gz
   tar xvf audio-0.5.1.tar.gz

   # Transcribe an audio file
   deepspeech --model deepspeech-0.5.1-models/output_graph.pbmm --lm deepspeech-0.5.1-models/lm.binary --trie deepspeech-0.5.1-models/trie --audio audio/2830-3980-0043.wav

A pre-trained English model is available for use and can be downloaded using `the instructions below <USING.rst#using-a-pre-trained-model>`_. Currently, only 16-bit, 16 kHz, mono-channel WAVE audio files are supported in the Python client. A package with some example audio files is available for download in our `release notes <https://github.com/mozilla/DeepSpeech/releases/latest>`_.

Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the `release notes <https://github.com/mozilla/DeepSpeech/releases/latest>`_ to find which GPUs are supported. To run ``deepspeech`` on a GPU, install the GPU specific package:

.. code-block:: bash

   # Create and activate a virtualenv
   virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/
   source $HOME/tmp/deepspeech-gpu-venv/bin/activate

   # Install DeepSpeech CUDA enabled package
   pip3 install deepspeech-gpu

   # Transcribe an audio file.
   deepspeech --model deepspeech-0.5.1-models/output_graph.pbmm --lm deepspeech-0.5.1-models/lm.binary --trie deepspeech-0.5.1-models/trie --audio audio/2830-3980-0043.wav

Please ensure you have the required `CUDA dependencies <USING.rst#cuda-dependency>`_.

See the output of ``deepspeech -h`` for more information on the use of ``deepspeech``. (If you experience problems running ``deepspeech``\ , please check `required runtime dependencies <native_client/README.rst#required-dependencies>`_\ ).

----

**Table of Contents**
  
* `Using a Pre-trained Model <USING.rst#using-a-pre-trained-model>`_

  * `CUDA dependency <USING.rst#cuda-dependency>`_
  * `Getting the pre-trained model <USING.rst#getting-the-pre-trained-model>`_
  * `Model compatibility <USING.rst#model-compatibility>`_
  * `Using the Python package <USING.rst#using-the-python-package>`_
  * `Using the Node.JS package <USING.rst#using-the-nodejs-package>`_
  * `Using the Command Line client <USING.rst#using-the-command-line-client>`_
  * `Installing bindings from source <USING.rst#installing-bindings-from-source>`_
  * `Third party bindings <USING.rst#third-party-bindings>`_


* `Trying out DeepSpeech with examples <examples/README.rst>`_

  * `Microphone VAD streaming  <examples/mic_vad_streaming/README.rst>`_
  
  * `FFMPEG VAD streaming  <examples/ffmpeg_vad_streaming/README.rst>`_
  
  * `Net framework  <examples/net_framework/README.rst>`_
  
  * `Nodejs wav  <examples/nodejs_wav/README.rst>`_
  
  * `VAD transcriber  <examples/vad_transcriber/README.rst>`_
  
* `Training your own Model <TRAINING.rst#training-your-own-model>`_

  * `Prerequisites for training a model <TRAINING.rst#prerequisites-for-training-a-model>`_
  * `Getting the training code <TRAINING.rst#getting-the-training-code>`_
  * `Installing Python dependencies <TRAINING.rst#installing-python-dependencies>`_
  * `Recommendations <TRAINING.rst#recommendations>`_
  * `Common Voice training data <TRAINING.rst#common-voice-training-data>`_
  * `Training a model <TRAINING.rst#training-a-model>`_
  * `Checkpointing <TRAINING.rst#checkpointing>`_
  * `Exporting a model for inference <TRAINING.rst#exporting-a-model-for-inference>`_
  * `Exporting a model for TFLite <TRAINING.rst#exporting-a-model-for-tflite>`_
  * `Making a mmap-able model for inference <TRAINING.rst#making-a-mmap-able-model-for-inference>`_
  * `Continuing training from a release model <TRAINING.rst#continuing-training-from-a-release-model>`_
  * `Training with Augmentation <TRAINING.rst#training-with-augmentation>`_

* `Contribution guidelines <CONTRIBUTING.rst>`_
* `Contact/Getting Help <SUPPORT.rst>`_