зеркало из https://github.com/mozilla/kaldi.git
Online decoding related READMEs and Makefiles by Matthias Paulik
git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@1286 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
This commit is contained in:
Родитель
0e2efaf615
Коммит
7f5ca446a5
|
@ -8,6 +8,12 @@ BINFILES =
|
|||
|
||||
OBJFILES = online-audio-source.o online-feat-input.o online-decodable.o online-faster-decoder.o onlinebin-util.o
|
||||
|
||||
UNAME=$(shell uname)
|
||||
ifeq ($(UNAME), Darwin)
|
||||
OBJFILES += ../../tools/portaudio/src/common/pa_ringbuffer.o
|
||||
endif
|
||||
|
||||
|
||||
LIBFILE = kaldi-online.a
|
||||
|
||||
all: $(LIBFILE) $(BINFILES)
|
||||
|
|
|
@ -1,10 +1,12 @@
|
|||
all:
|
||||
|
||||
EXTRA_CXXFLAGS = -Wno-sign-compare -I ../../tools/portaudio/install/include
|
||||
EXTRA_LDLIBS = ../../tools/portaudio/install/lib/libportaudio.a
|
||||
|
||||
UNAME=$(shell uname)
|
||||
ifeq ($(UNAME), Linux)
|
||||
EXTRA_LDLIBS += -lasound
|
||||
EXTRA_LDLIBS = ../../tools/portaudio/install/lib/libportaudio.a -lasound
|
||||
else
|
||||
EXTRA_LDLIBS = -L ../../tools/portaudio/install/lib/ -lportaudio
|
||||
endif
|
||||
|
||||
include ../kaldi.mk
|
||||
|
|
|
@ -0,0 +1,82 @@
|
|||
|
||||
This directory hosts example binaries that implement online decoding.
|
||||
|
||||
The online decoding code depends on libportaudio and it is assumed that you
|
||||
already compiled and installed portaudio in the ../../tools/ folder with the help of
|
||||
the install_portaudio.sh script that is provided there. Unfortunately, portaudio
|
||||
is more OS dependent then the other Kaldi code and you may face some issues,
|
||||
depending on your OS.
|
||||
Please also refer to the "Install instructions for PortAudio" section in tools/INSTALL.
|
||||
|
||||
!!CAUTION!! IF YOU ARE TRYING TO RUN THESE BINARIES ON MAC OS, YOU MAY
|
||||
NEED TO TELL THE BINARIES WHERE THEY CAN FIND LIBPORTAUDIO, SINCE IT IS
|
||||
DYNAMICALLY LINKED. TO AVOID THIS, YOU CAN SIMPLY COPY libportaudio.dylib
|
||||
TO /usr/local/lib. YOU CAN FIND THE DYLIB FILES IN tools/portaudio/install/lib
|
||||
AFTER HAVING RUN THE INSTALLATION SCRIPT FOR PORTAUDIO IN THE tools FOLDER.
|
||||
|
||||
Here is a short overview on the provided binaries.
|
||||
|
||||
#########################
|
||||
|
||||
online-gmm-decode-faster:
|
||||
|
||||
Demonstrates one possible implementation for online decoding. The binary
|
||||
records audio from your sound card, computes the features on-the-fly, performs
|
||||
decoding and finally displays the recognition output on stdout. Recognition
|
||||
output is displayed at a very low latency (and even before the end of an utterance
|
||||
is reached) by doing a partial trace-back whenever possible.
|
||||
|
||||
A very simple heuristic to detect the end of an utterance is employed:
|
||||
whenever x frames of silence (any consecutive sequence of "silence" phones)
|
||||
are detected at the beginning of the trace-back, we display the full trace-back
|
||||
and re-set the language model context. For easy readability, we insert two line
|
||||
breaks at the end of an utterance. The value of x (50 frames initially) is being
|
||||
lowered automatically whenever the current utterances becomes too long (longer than
|
||||
max-utt-length frames). The end pointing behavior can therefore be influenced via
|
||||
the optional parameter max-utt-length (the lower the value, the shorter the average
|
||||
utterance length) and by defining what phones constitute silence (sil, noise,
|
||||
laughter, etc.)
|
||||
|
||||
Decoding has to happen "fast enough" (real time factor, RTF < 1) so that the decoder
|
||||
can keep up with the live recorded audio. If the decoder is too slow (because of a
|
||||
slow machine, a too high beam, too big models), you will observe buffer overflow
|
||||
error messages (provided that you have set the "--verbose=1" option), which means
|
||||
that audio samples got lost.
|
||||
|
||||
To avoid dropped audio samples, an (imperfect) heuristic is used that tries to
|
||||
keep the RTF between the two values rt-min and rt-max by adjusting the decoding
|
||||
beam in the following manner: The effective beam width is updated every
|
||||
"update-interval"-th frame. The beam is scaled up or down by a fraction of its
|
||||
current size given by "beam-update" times a factor which depends on how much the
|
||||
current decoding time is off from the set target. The fraction by which the beam
|
||||
is updated however cannot be more than "max-beam-update" and the beam can never
|
||||
become wider than the value configured with the "--beam" option.
|
||||
|
||||
!!CAUTION!! AS MENTIONED, THE HEURISTIC DOES NOT ALWAYS WORK. IT IS THEREFORE
|
||||
IMPORTANT TO USE A REASONABLE MAX BEAM AND A REASONABLE NUMBER FOR MAX ACTIVE
|
||||
STATE. USE THE VERBOSE OPTION TO LOOK FOR BUFFER OVERFLOW MESSAGES!!
|
||||
|
||||
Another option that can influence RTF and also latency is "batch-size".
|
||||
Feature computation and decoding happens in batches of batch-size frames.
|
||||
The default value is 27 frames. The higher the value, the higher the latency,
|
||||
but smaller values may increase RTF.
|
||||
|
||||
#########################
|
||||
|
||||
online-wav-gmm-decode-faster:
|
||||
|
||||
Simulates online decoding on wav files. This is useful to measure word error rate
|
||||
and to tune parameters, such as batch size etc.
|
||||
|
||||
#########################
|
||||
|
||||
online-net-client & online-server-gmm-decode-faster:
|
||||
|
||||
Enables online decoding in a client-server fashion, that is, doing audio recording
|
||||
and feature computation on the client and sending the raw features to the recognition
|
||||
server to do the decoding. The recognition server sends the recognition result back
|
||||
to the client for display. UDP is used for communication. This is very useful when
|
||||
one wants to run big models that require lots of CPU power to keep the RTF low.
|
||||
We chose UDP, because we want to decode in real time and retransmissions would lead
|
||||
to delay and losing samples at PortAudio level. No guarantees are made about
|
||||
re-ordered packages but we haven't faced any such issues in our testing.
|
|
@ -27,6 +27,7 @@ See below install instructions for:
|
|||
native Windows compilation without Intel MKL you have to compile this)
|
||||
(7) CLAPACK headers (required if you have the library available but
|
||||
no headers in a directory accessed by default; this is the case on Cygwin)
|
||||
(8) libportaudio (needed for the online recognition binaries)
|
||||
|
||||
|
||||
####
|
||||
|
@ -183,3 +184,33 @@ for x in clapack.h f2c.h cblas.h; do
|
|||
wget http://www.netlib.org/clapack/$x;
|
||||
done
|
||||
|
||||
####
|
||||
|
||||
(8) Install instructions for libportaudio.
|
||||
|
||||
libportaudio is only needed for the online recognition binaries. It enables audio
|
||||
capture from the sound card. Unfortunately, the installer for libportaudio may not
|
||||
work on all version of Linux or Mac OS. However, for most people it will be
|
||||
sufficient to simply run the script
|
||||
|
||||
./install_portaudio.sh
|
||||
|
||||
We tested this installation script on various versions of Suse Linux and Red Hat as
|
||||
well as on Mac OS (Darwin). Please note that the installation script patches up the
|
||||
default Makefile of portaudio when installing on Mac (for details, please have a look
|
||||
at the installation script itself).
|
||||
|
||||
!!IMPORTANT!! UNDER MAC OS, YOU MAY NEED TO COPY THE libportaudio.dylib TO
|
||||
/usr/local/lib SO THAT THE BINARIES CAN FIND THE LIBRARY. AFTER COMPILING
|
||||
LIPPORTAUDIO, YOU CAN FIND THE DYLIB FILE(S) IN portaudio/install/lib
|
||||
|
||||
The best bet for support on compiling libportaudio, should you face any problems,
|
||||
would be the portaudio documentation:
|
||||
|
||||
http://portaudio.com/docs/v19-doxydocs/tutorial_start.html
|
||||
|
||||
We know of at least one instance were libportaudio compiled fine, but only
|
||||
produced garbage audio samples. This happened on a Linux system that only had
|
||||
Open Sound System (OSS) installed. We therefore recommend to install ALSA. This
|
||||
is also reflected in our Makefile for the binaries in /src/onlinebin where we
|
||||
include -lasound.
|
||||
|
|
Загрузка…
Ссылка в новой задаче