This commit is contained in:
erogol 2020-07-15 11:08:35 +02:00
Родитель 664f42df33
Коммит c9e2df1451
1 изменённых файлов: 14 добавлений и 24 удалений

Просмотреть файл

@ -1,7 +1,10 @@
<p align="center"><img src="https://user-images.githubusercontent.com/1402048/52643646-c2102980-2edd-11e9-8c37-b72f3c89a640.png" data-canonical-src="![TTS banner](https://user-images.githubusercontent.com/1402048/52643646-c2102980-2edd-11e9-8c37-b72f3c89a640.png =250x250)
" width="320" height="95" /></p>
<center>
<img src="https://travis-ci.org/mozilla/TTS.svg?branch=dev"/>
[![Discourse](https://img.shields.io/badge/discourse-online-green.svg)](https://discourse.mozilla.org/c/tts)
</center>
This project is a part of [Mozilla Common Voice](https://voice.mozilla.org/en). TTS aims a deep learning based Text2Speech engine, low in cost and high in quality.
@ -38,25 +41,26 @@ Vocoders:
You can also help us implement more models. Some TTS related work can be found [here](https://github.com/erogol/TTS-papers).
## Features
- High performance Deep Learning models for Text2Speech related tasks.
- Text2Speech models (Tacotron, Tacotron2).
- High performance Deep Learning models for Text2Speech tasks.
- Text2Spec models (Tacotron, Tacotron2).
- Speaker Encoder to compute speaker embeddings efficiently.
- Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS)
- Support for multi-speaker TTS training.
- Support for Multi-GPUs training.
- Ability to convert Torch models to Tensorflow 2.0 for inference.
- Released pre-trained models.
- Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN)
- Fast and efficient model training.
- Detailed training logs on console and Tensorboard.
- Support for multi-speaker TTS.
- Efficient Multi-GPUs training.
- Ability to convert PyTorch models to Tensorflow 2.0 and TFLite for inference.
- Released models in PyTorch, Tensorflow and TFLite.
- Tools to curate Text2Speech datasets under```dataset_analysis```.
- Demo server for model testing.
- Notebooks for extensive model benchmarking.
- Modular (but not too much) code base enabling easy testing for new ideas.
## Requirements and Installation
## Main Requirements and Installation
Highly recommended to use [miniconda](https://conda.io/miniconda.html) for easier installation.
* python>=3.6
* pytorch>=0.4.1
* pytorch>=1.4.1
* tensorflow>=2.2
* librosa
* tensorboard
* tensorboardX
@ -107,21 +111,7 @@ Audio examples: [soundcloud](https://soundcloud.com/user-565970875/pocket-articl
<img src="images/example_model_output.png?raw=true" alt="example_output" width="400"/>
## Runtime
The most time-consuming part is the vocoder algorithm (Griffin-Lim) which runs on CPU. By setting its number of iterations lower, you might have faster execution with a small loss of quality. Some of the experimental values are below.
Sentence: "It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent."
Audio length is approximately 6 secs.
| Time (secs) | System | # GL iters | Model
| ---- |:-------|:-----------| ---- |
|2.00|GTX1080Ti|30|Tacotron|
|3.01|GTX1080Ti|60|Tacotron|
|3.57|CPU|60|Tacotron|
|5.27|GTX1080Ti|60|Tacotron2|
|6.50|CPU|60|Tacotron2|
## [Mozilla TTS Tutorials and Notebooks](https://github.com/mozilla/TTS/wiki/TTS-Notebooks-and-Tutorials)
## Datasets and Data-Loading
TTS provides a generic dataloader easy to use for new datasets. You need to write an preprocessor function to integrate your own dataset.Check ```datasets/preprocess.py``` to see some examples. After the function, you need to set ```dataset``` field in ```config.json```. Do not forget other data related fields too.