Add instruction to install `sox` for CV data, add section on custom docker image
This commit is contained in:
Родитель
6d322ed85e
Коммит
c8f7cf8521
|
@ -43,18 +43,26 @@ If you are using data from Common Voice for training a model, you will need to p
|
|||
|
||||
In this example we will prepare the Indonesian dataset for training, but you can use any language from Common Voice that you prefer. We've chosen Indonesian as it has the same [orthographic alphabet](ALPHABET.md) as English, which means we don't have to use a different `alphabet.txt` file for training; we can use the default.
|
||||
|
||||
This example assumes you have already [set up a Docker environment with an attached volume for training](TRAINING.md).
|
||||
---
|
||||
This example assumes you have already [set up a Docker [environment](ENVIRONMENT.md) for [training](TRAINING.md). If you have not yet set up your Docker environment, we suggest you pause here and do this first.
|
||||
---
|
||||
|
||||
First, [download the dataset from Common Voice](https://commonvoice.mozilla.org/en/datasets). Extract the archive into your `deepspeech-data` Docker volume. Make sure that you place the files into the `_data` directory, otherwise they will not be available from within your Docker container.
|
||||
First, [download the dataset from Common Voice](https://commonvoice.mozilla.org/en/datasets), and extract the archive into your `deepspeech-data` directory. This makes it available to your Docker container through a _bind mount_. Start your DeepSpeech Docker container with the `deepspeech-data` directory as a _bind mount_ (this is covered in the [environment](ENVIRONMENT.md) section).
|
||||
|
||||
Start your DeepSpeech Docker container with the `deepspeech-data` volume attached. Your CV corpus data should be available from within the Docker container.
|
||||
Your CV corpus data should be available from within the Docker container.
|
||||
|
||||
```
|
||||
root@3de3afbe5d6f:/DeepSpeech# ls persistent-data/cv-corpus-6.1-2020-12-11/id/
|
||||
root@3de3afbe5d6f:/DeepSpeech# ls deepspeech-data/cv-corpus-6.1-2020-12-11/id/
|
||||
clips invalidated.tsv reported.tsv train.tsv
|
||||
dev.tsv other.tsv test.tsv validated.tsv
|
||||
```
|
||||
|
||||
The `deepspeech-training:v0.9.3` Docker image _does not_ come with `sox`, which is a package used for processing Common Voice data. We need to install `sox` first.
|
||||
|
||||
```
|
||||
root@4b39be3b0ffc:/DeepSpeech# apt-get -y update && apt-get install -y sox
|
||||
```
|
||||
|
||||
Next, we will run the Common Voice importer that ships with DeepSpeech.
|
||||
|
||||
```
|
||||
|
|
|
@ -15,7 +15,8 @@
|
|||
* [Pulling down a pre-built DeepSpeech Docker image](#pulling-down-a-pre-built-deepspeech-docker-image)
|
||||
+ [Testing the image by creating a container and running a script](#testing-the-image-by-creating-a-container-and-running-a-script)
|
||||
* [Setting up a bind mount to store persistent data](#setting-up-a-bind-mount-to-store-persistent-data)
|
||||
|
||||
* [Extending the base `deepspeech-training` Docker image for your needs](#extending-the-base--deepspeech-training--docker-image-for-your-needs)
|
||||
|
||||
This section of the Playbook assumes you are comfortable installing DeepSpeech and using it with a pre-trained model, and that you are comfortable setting up a Python _virtual environment_.
|
||||
|
||||
Here, we provide information on setting up a Docker environment for training your own speech recognition model using DeepSpeech. We also cover dependencies Docker has for NVIDIA GPUs, so that you can use your GPU(s) for training a model.
|
||||
|
@ -298,7 +299,27 @@ root@e964b1e5a60c:/DeepSpeech# ls | grep deepspeech-data
|
|||
deepspeech-data
|
||||
```
|
||||
|
||||
You are now ready to begin training your model.
|
||||
You are now ready to begin [training](TRAINING.md) your model.
|
||||
|
||||
## Extending the base `deepspeech-training` Docker image for your needs
|
||||
|
||||
As you become more comfortable training speech recognition models with DeepSpeech, you may wish to extend the base Docker image. You can do this using the `FROM` instruction in a `Dockerfile`, for example:
|
||||
|
||||
```
|
||||
# Custom Dockerfile for training models using DeepSpeech
|
||||
|
||||
# Get the latest DeepSpeech image
|
||||
FROM mozilla/deepspeech-train:v0.9.3
|
||||
|
||||
# Install nano editor
|
||||
RUN apt-get -y update && apt-get install -y nano
|
||||
|
||||
# Install sox for inference and for processing Common Voice data
|
||||
RUN apt-get -y update && apt-get install -y sox
|
||||
|
||||
```
|
||||
|
||||
You can then use `docker build` with this `Dockerfile` to build your own custom Docker image.
|
||||
|
||||
---
|
||||
|
||||
|
|
Загрузка…
Ссылка в новой задаче