Merge pull request #17 from mozilla/feature/dropout-rate

Added information on `--dropout_rate` hyperparameter, resolves #16
This commit is contained in:
Kathy Reid 2021-03-05 14:44:56 +11:00 коммит произвёл GitHub
Родитель 7e01d8153c 70b192a6b8
Коммит fb55586137
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
1 изменённых файлов: 30 добавлений и 0 удалений

Просмотреть файл

@ -18,6 +18,7 @@
+ [`n_hidden` parameter](#-n-hidden--parameter)
+ [Reduce learning rate on plateau (RLROP)](#reduce-learning-rate-on-plateau--rlrop-)
+ [Early stopping](#early-stopping)
+ [Dropout rate](#dropout-rate)
* [Steps and epochs](#steps-and-epochs)
* [Advanced training options](#advanced-training-options)
* [Monitoring GPU use with `nvtop`](#monitoring-gpu-use-with--nvtop-)
@ -250,6 +251,35 @@ python3 DeepSpeech.py \
--es_min_delta 0.06
```
### Dropout rate
In machine learning, one of the risks during training is that of [_overfitting_](https://en.wikipedia.org/wiki/Overfitting). _Overfitting_ is where training creates a model that does not _generalize_ well. That is, it _fits_ to only the set of data on which it is trained. During inference, new data is not recognised accurately.
_Dropout_ is a technical approach to reduce _overfitting_. In _dropout_, nodes are randomly removed from the neural network created during training. This simulates the effect of more diverse data, and is a computationally cheap way of reducing _overfitting_, and improving the _generalizability_ of the model.
_Dropout_ can be set for any layer of a neural network. The parameter that has the most effect for DeepSpeech training is `--dropout_rate`, which controls the feedforward layers of the neural network. To see the full set of _dropout parameters_, consult the DeepSpeech documentation.
* The `-dropout_rate` parameter specifies how many nodes should be dropped from the neural network during training. The default value is `0.05`. However, if you are training on less than thousands of hours of voice data, you will find a value of `0.3` to `0.4` works better to prevent overfitting.
An example of training with this parameter would be:
```
python3 DeepSpeech.py \
--train_files deepspeech-data/cv-corpus-6.1-2020-12-11/id/clips/train.csv \
--dev_files deepspeech-data/cv-corpus-6.1-2020-12-11/id/clips/dev.csv \
--test_files deepspeech-data/cv-corpus-6.1-2020-12-11/id/clips/test.csv \
--checkpoint_dir deepspeech-data/checkpoints \
--export_dir deepspeech-data/exported-model \
--n_hidden 64 \
--reduce_lr_on_plateau true \
--plateau_epochs 8 \
--plateau_reduction 0.08 \
--early_stop true \
--es_epochs 10 \
--es_min_delta 0.06 \
--dropout_rate 0.3
```
## Steps and epochs
In training, a _step_ is one update of the [gradient](https://en.wikipedia.org/wiki/Gradient_descent); that is, one attempt to find the lowest, or minimal _loss_. The amount of processing done in one _step_ depends on the _batch size_. By default, `DeepSpeech.py` has a _batch size_ of `1`. That is, it processes one audio file in each _step_.