b4079f4b70 | ||
---|---|---|
align | ||
bin | ||
data | ||
doc | ||
.gitignore | ||
CODE_OF_CONDUCT.md | ||
LICENSE | ||
README.md | ||
requirements.txt |
README.md
DSAlign
DeepSpeech based forced alignment tool
Installation
It is recommended to use this tool from within a virtual environment.
After cloning and changing to the root of the project,
there is a script for creating one with all requirements in the git-ignored dir venv
:
$ bin/createenv.sh
$ ls venv
bin include lib lib64 pyvenv.cfg share
bin/align.sh
will automatically use it.
Internally DSAlign uses the DeepSpeech STT engine. For it to be able to function, it requires a couple of files that are specific to the language of the speech data you want to align. If you want to align English, there is already a helper script that will download and prepare all required data:
$ bin/getmodel.sh
[...]
$ ls models/en/
alphabet.txt lm.binary output_graph.pb output_graph.pbmm output_graph.tflite trie
Overview and documentation
A typical application of the aligner is done in three phases:
- Preparing the data. Albeit most of this has to be done individually, there are some tools for data preparation, statistics and maintenance. All involved file formats are described here.
- Aligning the data using the alignment tool and it algorithm.
- Exporting aligned data using the data-set exporter.
Quickstart example
Example data
There is a script for downloading and preparing some public domain speech and transcript data.
It requires ffmpeg
for some sample conversion.
$ bin/gettestdata.sh
$ ls data
test1 test2
Alignment using example data
Now the aligner can be called either "manually" (specifying all involved files directly):
$ bin/align.sh --audio data/test1/audio.wav --script data/test1/transcript.txt --aligned data/test1/aligned.json --tlog data/test1/transcript.log
Or "automatically" by specifying a so-called catalog file that bundles all involved paths:
$ bin/align.sh --catalog data/test1.catalog