Online decoding related READMEs and Makefiles by Matthias Paulik

git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@1286 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
This commit is contained in:
Vassil Panayotov 2012-08-24 08:16:13 +00:00
Родитель 0e2efaf615
Коммит 7f5ca446a5
4 изменённых файлов: 123 добавлений и 2 удалений

Просмотреть файл

@ -8,6 +8,12 @@ BINFILES =
OBJFILES = online-audio-source.o online-feat-input.o online-decodable.o online-faster-decoder.o onlinebin-util.o
UNAME=$(shell uname)
ifeq ($(UNAME), Darwin)
OBJFILES += ../../tools/portaudio/src/common/pa_ringbuffer.o
endif
LIBFILE = kaldi-online.a
all: $(LIBFILE) $(BINFILES)

Просмотреть файл

@ -1,10 +1,12 @@
all:
EXTRA_CXXFLAGS = -Wno-sign-compare -I ../../tools/portaudio/install/include
EXTRA_LDLIBS = ../../tools/portaudio/install/lib/libportaudio.a
UNAME=$(shell uname)
ifeq ($(UNAME), Linux)
EXTRA_LDLIBS += -lasound
EXTRA_LDLIBS = ../../tools/portaudio/install/lib/libportaudio.a -lasound
else
EXTRA_LDLIBS = -L ../../tools/portaudio/install/lib/ -lportaudio
endif
include ../kaldi.mk

82
src/onlinebin/README.txt Normal file
Просмотреть файл

@ -0,0 +1,82 @@
This directory hosts example binaries that implement online decoding.
The online decoding code depends on libportaudio and it is assumed that you
already compiled and installed portaudio in the ../../tools/ folder with the help of
the install_portaudio.sh script that is provided there. Unfortunately, portaudio
is more OS dependent then the other Kaldi code and you may face some issues,
depending on your OS.
Please also refer to the "Install instructions for PortAudio" section in tools/INSTALL.
!!CAUTION!! IF YOU ARE TRYING TO RUN THESE BINARIES ON MAC OS, YOU MAY
NEED TO TELL THE BINARIES WHERE THEY CAN FIND LIBPORTAUDIO, SINCE IT IS
DYNAMICALLY LINKED. TO AVOID THIS, YOU CAN SIMPLY COPY libportaudio.dylib
TO /usr/local/lib. YOU CAN FIND THE DYLIB FILES IN tools/portaudio/install/lib
AFTER HAVING RUN THE INSTALLATION SCRIPT FOR PORTAUDIO IN THE tools FOLDER.
Here is a short overview on the provided binaries.
#########################
online-gmm-decode-faster: 
Demonstrates one possible implementation for online decoding. The binary
records audio from your sound card, computes the features on-the-fly, performs
decoding and finally displays the recognition output on stdout. Recognition
output is displayed at a very low latency (and even before the end of an utterance
is reached) by doing a partial trace-back whenever possible.
A very simple heuristic to detect the end of an utterance is employed:
whenever x frames of silence (any consecutive sequence of "silence" phones)
are detected at the beginning of  the trace-back, we display the full trace-back
and re-set the language model context. For easy readability, we insert two line
breaks at the end of an utterance. The value of x (50 frames initially) is being
lowered automatically whenever the current utterances becomes too long (longer than
max-utt-length frames). The end pointing behavior can therefore be influenced via
the optional parameter max-utt-length (the lower the value, the shorter the average
utterance length) and by defining what phones constitute silence (sil, noise,
laughter, etc.)
Decoding has to happen "fast enough" (real time factor, RTF < 1) so that the decoder
can keep up with the live recorded audio. If the decoder is too slow (because of a
slow machine, a too high beam, too big models), you will observe buffer overflow
error messages (provided that you have set the "--verbose=1" option), which means
that audio samples got lost.
To avoid dropped audio samples, an (imperfect) heuristic is used that tries to
keep the RTF between the two values rt-min and rt-max by adjusting the decoding
beam in the following manner: The effective beam width is updated every
"update-interval"-th frame. The beam is scaled up or down by a fraction of its
current size given by "beam-update" times a factor which depends on how much the
current decoding time is off from the set target. The fraction by which the beam
is updated however cannot be more than "max-beam-update" and the beam can never
become wider than the value configured with the "--beam" option.
!!CAUTION!! AS MENTIONED, THE HEURISTIC DOES NOT ALWAYS WORK. IT IS THEREFORE
IMPORTANT TO USE A REASONABLE MAX BEAM AND A REASONABLE NUMBER FOR MAX ACTIVE
STATE. USE THE VERBOSE OPTION TO LOOK FOR BUFFER OVERFLOW MESSAGES!!
Another option that can influence RTF and also latency is "batch-size".
Feature computation and decoding happens in batches of batch-size frames.
The default value is 27 frames. The higher the value, the higher the latency,
but smaller values may increase RTF.
#########################
online-wav-gmm-decode-faster:
Simulates online decoding on wav files. This is useful to measure word error rate
and to tune parameters, such as batch size etc.
#########################
online-net-client & online-server-gmm-decode-faster:
Enables online decoding in a client-server fashion, that is, doing audio recording
and feature computation on the client and sending the raw features to the recognition
server to do the decoding. The recognition server sends the recognition result back
to the client for display. UDP is used for communication. This is very useful when
one wants to run big models that require lots of CPU power to keep the RTF low.
We chose UDP, because we want to decode in real time and retransmissions would lead
to delay and losing samples at PortAudio level. No guarantees are made about
re-ordered packages but we haven't faced any such issues in our testing.

Просмотреть файл

@ -27,6 +27,7 @@ See below install instructions for:
native Windows compilation without Intel MKL you have to compile this)
(7) CLAPACK headers (required if you have the library available but
no headers in a directory accessed by default; this is the case on Cygwin)
(8) libportaudio (needed for the online recognition binaries)
####
@ -183,3 +184,33 @@ for x in clapack.h f2c.h cblas.h; do
wget http://www.netlib.org/clapack/$x;
done
####
(8) Install instructions for libportaudio.
libportaudio is only needed for the online recognition binaries. It enables audio
capture from the sound card. Unfortunately, the installer for libportaudio may not
work on all version of Linux or Mac OS. However, for most people it will be
sufficient to simply run the script
./install_portaudio.sh
We tested this installation script on various versions of Suse Linux and Red Hat as
well as on Mac OS (Darwin). Please note that the installation script patches up the
default Makefile of portaudio when installing on Mac (for details, please have a look
at the installation script itself).
!!IMPORTANT!! UNDER MAC OS, YOU MAY NEED TO COPY THE libportaudio.dylib TO
/usr/local/lib SO THAT THE BINARIES CAN FIND THE LIBRARY. AFTER COMPILING
LIPPORTAUDIO, YOU CAN FIND THE DYLIB FILE(S) IN portaudio/install/lib
The best bet for support on compiling libportaudio, should you face any problems,
would be the portaudio documentation:
http://portaudio.com/docs/v19-doxydocs/tutorial_start.html
We know of at least one instance were libportaudio compiled fine, but only
produced garbage audio samples. This happened on a Linux system that only had
Open Sound System (OSS) installed. We therefore recommend to install ALSA. This
is also reflected in our Makefile for the binaries in /src/onlinebin where we
include -lasound.