зеркало из https://github.com/mozilla/DeepSpeech.git
Updating Geometry
This commit is contained in:
Родитель
7d96540d66
Коммит
f75b9cc926
|
@ -7,7 +7,7 @@ n_steps
|
|||
-------
|
||||
The network views each speech sample as a sequence of time-slices :math:`x^{(i)}_t` of
|
||||
length :math:`T^{(i)}`. As the speech samples vary in length, we know that :math:`T^{(i)}`
|
||||
need not equal :math:`T^{(j)}` for :math:`i \ne j`. For each batch, BRNN in TensorFlow needs
|
||||
need not equal :math:`T^{(j)}` for :math:`i \ne j`. For each batch, RNN in TensorFlow needs
|
||||
to know ``n_steps`` which is the maximum :math:`T^{(i)}` for the batch.
|
||||
|
||||
n_input
|
||||
|
@ -17,14 +17,14 @@ time-slice of the speech sample. We will make the number of MFCC features
|
|||
dependent upon the sample rate of the data set. Generically, if the sample rate
|
||||
is 8kHz we use 13 features. If the sample rate is 16kHz we use 26 features...
|
||||
We capture the dimension of these vectors, equivalently the number of MFCC
|
||||
features, in the variable ``n_input``.
|
||||
features, in the variable ``n_input``. By default ``n_input`` is 26.
|
||||
|
||||
n_context
|
||||
---------
|
||||
As previously mentioned, the BRNN is not simply fed the MFCC features of a given
|
||||
time-slice. It is fed, in addition, a context of :math:`C \in \{5, 7, 9\}` frames on
|
||||
As previously mentioned, the RNN is not simply fed the MFCC features of a given
|
||||
time-slice. It is fed, in addition, a context of :math:`C` frames on
|
||||
either side of the frame in question. The number of frames in this context is
|
||||
captured in the variable ``n_context``.
|
||||
captured in the variable ``n_context``. By default ``n_context`` is 9.
|
||||
|
||||
Next we will introduce constants that specify the geometry of some of the
|
||||
non-recurrent layers of the network. We do this by simply specifying the number
|
||||
|
@ -36,20 +36,13 @@ n_hidden_1, n_hidden_2, n_hidden_5
|
|||
of units in the second, and ``n_hidden_5`` the number in the fifth. We haven't
|
||||
forgotten about the third or sixth layer. We will define their unit count below.
|
||||
|
||||
A LSTM BRNN consists of a pair of LSTM RNN's.
|
||||
One LSTM RNN that works "forward in time":
|
||||
The RNN consists of an LSTM RNN that works "forward in time":
|
||||
|
||||
.. image:: ../images/LSTM3-chain.png
|
||||
:alt: Image shows a diagram of a recurrent neural network with LSTM cells, with arrows depicting the flow of data from earlier time steps to later timesteps within the RNN.
|
||||
|
||||
and a second LSTM RNN that works "backwards in time":
|
||||
|
||||
.. image:: ../images/LSTM3-chain-backwards.png
|
||||
:alt: Image shows a diagram of a recurrent neural network with LSTM cells, this time with data flowing from later time steps to earlier timesteps within the RNN.
|
||||
|
||||
The dimension of the cell state, the upper line connecting subsequent LSTM units,
|
||||
is independent of the input dimension and the same for both the forward and
|
||||
backward LSTM RNN.
|
||||
is independent of the input dimension.
|
||||
|
||||
n_cell_dim
|
||||
----------
|
||||
|
@ -63,11 +56,11 @@ determined by ``n_cell_dim`` as follows
|
|||
|
||||
.. code:: python
|
||||
|
||||
n_hidden_3 = 2 * n_cell_dim
|
||||
n_hidden_3 = n_cell_dim
|
||||
|
||||
n_character
|
||||
n_hidden_6
|
||||
-----------
|
||||
The variable ``n_character`` will hold the number of characters in the target
|
||||
The variable ``n_hidden_6`` will hold the number of characters in the target
|
||||
language plus one, for the :math:`blank`.
|
||||
For English it is the cardinality of the set
|
||||
|
||||
|
@ -75,12 +68,3 @@ For English it is the cardinality of the set
|
|||
\{a,b,c, . . . , z, space, apostrophe, blank\}
|
||||
|
||||
we referred to earlier.
|
||||
|
||||
n_hidden_6
|
||||
----------
|
||||
The number of units in the sixth layer is determined by ``n_character`` as follows:
|
||||
|
||||
.. code:: python
|
||||
|
||||
n_hidden_6 = n_character
|
||||
|
||||
|
|
Двоичные данные
images/LSTM3-chain-backwards.png
Двоичные данные
images/LSTM3-chain-backwards.png
Двоичный файл не отображается.
До Ширина: | Высота: | Размер: 221 KiB |
Загрузка…
Ссылка в новой задаче