Online decoding related READMEs and Makefiles by Matthias Paulik

git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@1286 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
2012-08-24 08:16:13 +00:00 · 2012-08-24 08:16:13 +00:00 · 7f5ca446a5
--- a/src/online/Makefile
+++ b/src/online/Makefile
@ -8,6 +8,12 @@ BINFILES =

 OBJFILES = online-audio-source.o online-feat-input.o online-decodable.o online-faster-decoder.o onlinebin-util.o

+UNAME=$(shell uname)
+ifeq ($(UNAME), Darwin)
+    OBJFILES += ../../tools/portaudio/src/common/pa_ringbuffer.o
+endif
+
+
 LIBFILE = kaldi-online.a

 all:  $(LIBFILE) $(BINFILES)
--- a/src/onlinebin/Makefile
+++ b/src/onlinebin/Makefile
@ -1,10 +1,12 @@
 all:

 EXTRA_CXXFLAGS = -Wno-sign-compare -I ../../tools/portaudio/install/include
-EXTRA_LDLIBS = ../../tools/portaudio/install/lib/libportaudio.a
+
 UNAME=$(shell uname)
 ifeq ($(UNAME), Linux)
-    EXTRA_LDLIBS += -lasound
+    EXTRA_LDLIBS = ../../tools/portaudio/install/lib/libportaudio.a -lasound
+else
+    EXTRA_LDLIBS = -L ../../tools/portaudio/install/lib/ -lportaudio
 endif

 include ../kaldi.mk
--- a/src/onlinebin/README.txt
+++ b/src/onlinebin/README.txt
@ -0,0 +1,82 @@
+
+This directory hosts example binaries that implement online decoding.
+
+The online decoding code depends on libportaudio and it is assumed that you 
+already compiled and installed portaudio in the ../../tools/ folder with the help of
+the install_portaudio.sh script that is provided there. Unfortunately, portaudio
+is more OS dependent then the other Kaldi code and you may face some issues, 
+depending on your OS.
+Please also refer to the "Install instructions for PortAudio" section in tools/INSTALL.
+
+!!CAUTION!! IF YOU ARE TRYING TO RUN THESE BINARIES ON MAC OS, YOU MAY
+NEED TO TELL THE BINARIES WHERE THEY CAN FIND LIBPORTAUDIO, SINCE IT IS
+DYNAMICALLY LINKED. TO AVOID THIS, YOU CAN SIMPLY COPY libportaudio.dylib
+TO /usr/local/lib. YOU CAN FIND THE DYLIB FILES IN tools/portaudio/install/lib
+AFTER HAVING RUN THE INSTALLATION SCRIPT FOR PORTAUDIO IN THE tools FOLDER.
+
+Here is a short overview on the provided binaries.
+
+#########################
+
+online-gmm-decode-faster: 
+
+Demonstrates one possible implementation for online decoding. The binary
+records audio from your sound card, computes the features on-the-fly, performs 
+decoding and finally displays the recognition output on stdout. Recognition 
+output is displayed at a very low latency (and even before the end of an utterance
+is reached) by doing a partial trace-back whenever possible.
+
+A very simple heuristic to detect the end of an utterance is employed:
+whenever x frames of silence (any consecutive sequence of "silence" phones) 
+are detected at the beginning of  the trace-back, we display the full trace-back 
+and re-set the language model context. For easy readability, we insert two line
+breaks at the end of an utterance. The value of x (50 frames initially) is being
+lowered automatically whenever the current utterances becomes too long (longer than
+max-utt-length frames). The end pointing behavior can therefore be influenced via 
+the optional parameter max-utt-length (the lower the value, the shorter the average 
+utterance length) and by defining what phones constitute silence (sil, noise, 
+laughter, etc.)
+
+Decoding has to happen "fast enough" (real time factor, RTF < 1) so that the decoder
+can keep up with the live recorded audio. If the decoder is too slow (because of a 
+slow machine, a too high beam, too big models), you will observe buffer overflow 
+error messages (provided that you have set the "--verbose=1" option), which means 
+that audio samples got lost. 
+
+To avoid dropped audio samples, an (imperfect) heuristic is used that tries to 
+keep the RTF between the two values rt-min and rt-max by adjusting the decoding 
+beam in the following manner: The effective beam width is updated every 
+"update-interval"-th frame. The beam is scaled up or down by a fraction of its
+current size given by "beam-update" times a factor which depends on how much the
+current decoding time is off from the set target. The fraction by which the beam
+is updated however cannot be more than "max-beam-update" and the beam can never
+become wider than the value configured with the "--beam" option.
+
+!!CAUTION!! AS MENTIONED, THE HEURISTIC DOES NOT ALWAYS WORK. IT IS THEREFORE 
+IMPORTANT TO USE A REASONABLE MAX BEAM AND A REASONABLE NUMBER FOR MAX ACTIVE 
+STATE. USE THE VERBOSE OPTION TO LOOK FOR BUFFER OVERFLOW MESSAGES!!
+
+Another option that can influence RTF and also latency is "batch-size". 
+Feature computation and decoding happens in batches of batch-size frames. 
+The default value is 27 frames. The higher the value, the higher the latency, 
+but smaller values may increase RTF.
+
+#########################
+
+online-wav-gmm-decode-faster:
+
+Simulates online decoding on wav files. This is useful to measure word error rate
+and to tune parameters, such as batch size etc.
+
+#########################
+
+online-net-client & online-server-gmm-decode-faster:
+
+Enables online decoding in a client-server fashion, that is, doing audio recording
+and feature computation on the client and sending the raw features to the recognition
+server to do the decoding. The recognition server sends the recognition result back
+to the client for display. UDP is used for communication. This is very useful when
+one wants to run big models that require lots of CPU power to keep the RTF low. 
+We chose UDP, because we want to decode in real time and retransmissions would lead
+to delay and losing samples at PortAudio level. No guarantees are made about
+re-ordered packages but we haven't faced any such issues in our testing.
--- a/tools/INSTALL
+++ b/tools/INSTALL
@ -27,6 +27,7 @@ See below install instructions for:
  native Windows compilation without Intel MKL you have to compile this)
 (7) CLAPACK headers (required if you have the library available but
 no headers in a directory accessed by default; this is the case on Cygwin)
+(8) libportaudio (needed for the online recognition binaries)


 ####
@ -183,3 +184,33 @@ for x in  clapack.h	f2c.h cblas.h; do
  wget http://www.netlib.org/clapack/$x; 
 done

+####
+
+(8) Install instructions for libportaudio.
+
+libportaudio is only needed for the online recognition binaries. It enables audio
+capture from the sound card. Unfortunately, the installer for libportaudio may not 
+work on all version of Linux or Mac OS. However, for most people it will be 
+sufficient to simply run the script
+
+./install_portaudio.sh
+
+We tested this installation script on various versions of Suse Linux and Red Hat as
+well as on Mac OS (Darwin). Please note that the installation script patches up the 
+default Makefile of portaudio when installing on Mac (for details, please have a look
+at the installation script itself). 
+
+!!IMPORTANT!! UNDER MAC OS, YOU MAY NEED TO COPY THE libportaudio.dylib TO
+/usr/local/lib SO THAT THE BINARIES CAN FIND THE LIBRARY. AFTER COMPILING 
+LIPPORTAUDIO, YOU CAN FIND THE DYLIB FILE(S) IN portaudio/install/lib
+
+The best bet for support on compiling libportaudio, should you face any problems, 
+would be the portaudio documentation:
+
+http://portaudio.com/docs/v19-doxydocs/tutorial_start.html
+
+We know of at least one instance were libportaudio  compiled fine, but only 
+produced garbage audio samples. This happened on a Linux system  that only had 
+Open Sound System (OSS) installed. We therefore recommend to install ALSA. This 
+is also reflected in our Makefile for the binaries in /src/onlinebin where we 
+include -lasound.