working w/sequences docs, tests, etc.
This commit is contained in:
Родитель
3d6e4a54ea
Коммит
f15e860bf1
|
@ -1,5 +1,5 @@
|
|||
Working with Sequences
|
||||
=======
|
||||
======================
|
||||
|
||||
CNTK Concepts
|
||||
~~~~~~~~~~~~~
|
||||
|
@ -110,8 +110,8 @@ dealing with sequences. This includes everything from text to music to video; an
|
|||
where the current state is dependent on the previous state. While RNNs are indeed
|
||||
powerful, the "vanilla" RNN is extremely hard to learn via gradient based methods.
|
||||
Because the gradient needs to flow back through the network to learn, the contribution
|
||||
from an early element (for example a word at the start of a sentence) on a much later
|
||||
elements (like the last word) can essentially vanish.
|
||||
from an early element (for example a word at the start of a sentence) on much later
|
||||
elements (like the classification of last word) can essentially vanish.
|
||||
|
||||
Dealing with the above problem is an active area of research. An architecture that
|
||||
seems to be successful in practice is the Long Short Term Memory (LSTM) network.
|
||||
|
@ -134,9 +134,10 @@ can let a neural network sort out these details by forcing each word to be repre
|
|||
by a short learned vector. Then in order for the network to do well on its task it
|
||||
has to learn to map the words to these vectors effectively. For example, the vector
|
||||
representing the word "cat" may somehow be close, in some sense, to the vector for "dog".
|
||||
In our task, we will use a pre-computed word embedding model
|
||||
using `GloVe <http://nlp.stanford.edu/projects/glove/>`_ and each of the words in the
|
||||
sequences will be replaced by their respective GloVe vector.
|
||||
In our task we will learn these word embeddings from scratch. However, it is also
|
||||
possible to initialize our embedding with a pre-computed word embedding such as
|
||||
`GloVe <http://nlp.stanford.edu/projects/glove/>`_ which has been trained on
|
||||
corpora containing billions of words.
|
||||
|
||||
Now that we've decided on our word representation and the type of recurrent neural
|
||||
network we want to use, let's define the network that we'll use to do
|
||||
|
@ -146,70 +147,69 @@ sequence classification. We can think of the network as adding a series of layer
|
|||
2. LSTM layer (allow each word to depend on previous words)
|
||||
3. Softmax layer (an additional set of parameters and output probabilities per class)
|
||||
|
||||
This network is defined as part of the example at ``Examples/SequenceClassification/SimpleExample/Python/SequenceClassification.py``. Let's go through some
|
||||
key parts of the code::
|
||||
A very similar network is also located at ``Examples/SequenceClassification/SimpleExample/Python/SequenceClassification.py``.
|
||||
Let's see how easy it is to work with sequences in CNTK:
|
||||
|
||||
# model
|
||||
input_dim = 2000
|
||||
cell_dim = 25
|
||||
hidden_dim = 25
|
||||
embedding_dim = 50
|
||||
num_output_classes = 5
|
||||
.. literalinclude:: simplernn.py
|
||||
|
||||
# Input variables denoting the features and label data
|
||||
features = input_variable(shape=input_dim, is_sparse=True)
|
||||
label = input_variable(num_output_classes, dynamic_axes = [Axis.default_batch_axis()])
|
||||
Running this script should generate this output::
|
||||
|
||||
# Instantiate the sequence classification model
|
||||
classifier_output = LSTM_sequence_classifer_net(features, num_output_classes, embedding_dim, hidden_dim, cell_dim)
|
||||
|
||||
ce = cross_entropy_with_softmax(classifier_output, label)
|
||||
pe = classification_error(classifier_output, label)
|
||||
|
||||
rel_path = r"../../../../Tests/EndToEndTests/Text/SequenceClassification/Data/Train.ctf"
|
||||
path = os.path.join(os.path.dirname(os.path.abspath(__file__)), rel_path)
|
||||
|
||||
mb_source = text_format_minibatch_source(path, [
|
||||
StreamConfiguration( 'features', input_dim, True, 'x' ),
|
||||
StreamConfiguration( 'labels', num_output_classes, False, 'y')], 0)
|
||||
|
||||
features_si = mb_source.stream_info(features)
|
||||
labels_si = mb_source.stream_info(label)
|
||||
|
||||
# Instantiate the trainer object to drive the model training
|
||||
trainer = Trainer(classifier_output, ce, pe, [sgd_learner(classifier_output.parameters(), lr=0.0005)])
|
||||
|
||||
# Get minibatches of sequences to train with and perform model training
|
||||
minibatch_size = 200
|
||||
training_progress_output_freq = 10
|
||||
i = 0
|
||||
while True:
|
||||
mb = mb_source.get_next_minibatch(minibatch_size)
|
||||
if len(mb) == 0:
|
||||
break
|
||||
|
||||
# Specify the mapping of input variables in the model to actual minibatch data to be trained with
|
||||
arguments = {features : mb[features_si].m_data, label : mb[labels_si].m_data}
|
||||
trainer.train_minibatch(arguments)
|
||||
|
||||
print_training_progress(trainer, i, training_progress_output_freq)
|
||||
i += 1
|
||||
average since average since examples
|
||||
loss last metric last
|
||||
------------------------------------------------------
|
||||
1.61 1.61 0.886 0.886 44
|
||||
1.61 1.6 0.714 0.629 133
|
||||
1.6 1.59 0.56 0.448 316
|
||||
1.57 1.55 0.479 0.41 682
|
||||
1.53 1.5 0.464 0.449 1379
|
||||
1.46 1.4 0.453 0.441 2813
|
||||
1.37 1.28 0.45 0.447 5679
|
||||
1.3 1.23 0.448 0.447 11365
|
||||
error: 0.333333
|
||||
|
||||
Let's go through some of the intricacies of the network definition above. As usual, we first set the parameters of our model. In this case we
|
||||
have a vocab (input dimension) of 2000, LSTM hidden and cell dimensions of 25, an embedding layer with dimension 50, and we have 5 possible
|
||||
have a vocabulary (input dimension) of 2000, LSTM hidden and cell dimensions of 25, an embedding layer with dimension 50, and we have 5 possible
|
||||
classes for our sequences. As before, we define two input variables: one for the features, and for the labels. We then instantiate our model. The
|
||||
``LSTM_sequence_classifier_net`` is a simple function which looks up our input in an embedding matrix and returns the embedded representation, puts
|
||||
that input through an LSTM recurrent neural network layer, and returns a fixed-size output from the LSTM by selecting the last hidden state of the
|
||||
LSTM::
|
||||
|
||||
embedding_function = embedding(input, embedding_dim)
|
||||
LSTM_function = LSTMP_component_with_self_stabilization(embedding_function.output(), LSTM_dim, cell_dim)[0]
|
||||
thought_vector = select_last(LSTM_function)
|
||||
|
||||
embedded_inputs = embedding(input, embedding_dim)
|
||||
lstm_outputs = simple_lstm(embedded_inputs, LSTM_dim, cell_dim)[0]
|
||||
thought_vector = sequence.last(lstm_outputs)
|
||||
return linear_layer(thought_vector, num_output_classes)
|
||||
|
||||
That is the entire network definition. We now simply set up our criterion nodes and then our training loop. In the above example we use a minibatch
|
||||
size of 200 and use basic SGD with the default parameters and a small learning rate of 0.0005. This results in a powerful state-of-the-art model for
|
||||
sequence classification that can scale with huge amounts of training data. Note that as your training data size grows, you should give more capacity to
|
||||
your LSTM by increasing the number of hidden dimensions. Further, you can get an even more complex network by stacking layers of LSTMs. This is also easy
|
||||
using the LSTM layer function [coming soon].
|
||||
your LSTM by increasing the number of hidden dimensions. Further, you can get an even more complex network by stacking layers of LSTMs.
|
||||
|
||||
Feeding Sequences with NumPy
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
While CNTK has very efficient built-in readers that take care of many details for you
|
||||
(randomization, prefetching, reduced memory usage, etc.) sometimes your data is already
|
||||
in numpy arrays. Therefore it is important to know how to specify a sequence of inputs
|
||||
and how to specify a minibatch of sequences.
|
||||
|
||||
Each sequence must be its own NumPy array. Therefore if you have an input variable
|
||||
that represents a small color image like this::
|
||||
|
||||
x = input_variable((3,32,32))
|
||||
|
||||
and you want to feed a sequence of 4 images `img1` to `img4`, to CNTK then
|
||||
you need to create a tensor containing all 4 images. For example::
|
||||
|
||||
img_seq = np.stack([img1, img2, img3, img4])
|
||||
output = network.eval({x:[img_seq]})
|
||||
|
||||
The stack function in NumPy stacks the inputs along a new axis (placing it in the beginning by default)
|
||||
so the shape of `img_seq` is :math:`4 \times 3 \times 32 \times 32`. You
|
||||
might have noticed that before binding `img_seq` to `x` we wrap it in a list.
|
||||
This list denotes a minibatch of 1 and **minibatches
|
||||
are specified as lists**. The reason for this is because different elements of
|
||||
the minibatch can have different lengths. If all the elements in the
|
||||
minibatch are sequences of the same length then it is acceptable to provide
|
||||
the minibatch as one big tensor of dimnesion :math:`b \times s \times d_1 \times \ldots \times d_k`
|
||||
where `b` is the batch size, `s` is the sequence length and :math:`d_i`
|
||||
is the dimension of the i-th static axis of the input variable.
|
||||
|
|
|
@ -0,0 +1,86 @@
|
|||
import sys
|
||||
import os
|
||||
from cntk import Trainer, Axis
|
||||
from cntk.io import MinibatchSource, CTFDeserializer, StreamDef, StreamDefs,\
|
||||
INFINITELY_REPEAT, FULL_DATA_SWEEP
|
||||
from cntk.learner import sgd, learning_rate_schedule, UnitType
|
||||
from cntk.ops import input_variable, cross_entropy_with_softmax, \
|
||||
classification_error, sequence
|
||||
from cntk.utils import ProgressPrinter
|
||||
|
||||
abs_path = os.path.dirname(os.path.abspath(__file__))
|
||||
sys.path.append(os.path.join(abs_path, "..", "..", "..", "Examples", "common"))
|
||||
from nn import LSTMP_component_with_self_stabilization as simple_lstm
|
||||
from nn import embedding, linear_layer
|
||||
|
||||
|
||||
# Creates the reader
|
||||
def create_reader(path, is_training, input_dim, label_dim):
|
||||
return MinibatchSource(CTFDeserializer(path, StreamDefs(
|
||||
features=StreamDef(field='x', shape=input_dim, is_sparse=True),
|
||||
labels=StreamDef(field='y', shape=label_dim, is_sparse=False)
|
||||
)), randomize=is_training,
|
||||
epoch_size=INFINITELY_REPEAT if is_training else FULL_DATA_SWEEP)
|
||||
|
||||
|
||||
# Defines the LSTM model for classifying sequences
|
||||
def LSTM_sequence_classifer_net(input, num_output_classes, embedding_dim,
|
||||
LSTM_dim, cell_dim):
|
||||
embedded_inputs = embedding(input, embedding_dim)
|
||||
lstm_outputs = simple_lstm(embedded_inputs, LSTM_dim, cell_dim)[0]
|
||||
thought_vector = sequence.last(lstm_outputs)
|
||||
return linear_layer(thought_vector, num_output_classes)
|
||||
|
||||
|
||||
# Creates and trains a LSTM sequence classification model
|
||||
def train_sequence_classifier(debug_output=False):
|
||||
input_dim = 2000
|
||||
cell_dim = 25
|
||||
hidden_dim = 25
|
||||
embedding_dim = 50
|
||||
num_output_classes = 5
|
||||
|
||||
# Input variables denoting the features and label data
|
||||
features = input_variable(shape=input_dim, is_sparse=True)
|
||||
label = input_variable(num_output_classes, dynamic_axes=[
|
||||
Axis.default_batch_axis()])
|
||||
|
||||
# Instantiate the sequence classification model
|
||||
classifier_output = LSTM_sequence_classifer_net(
|
||||
features, num_output_classes, embedding_dim, hidden_dim, cell_dim)
|
||||
|
||||
ce = cross_entropy_with_softmax(classifier_output, label)
|
||||
pe = classification_error(classifier_output, label)
|
||||
|
||||
rel_path = ("../../../Tests/EndToEndTests/Text/" +
|
||||
"SequenceClassification/Data/Train.ctf")
|
||||
path = os.path.join(os.path.dirname(os.path.abspath(__file__)), rel_path)
|
||||
|
||||
reader = create_reader(path, True, input_dim, num_output_classes)
|
||||
|
||||
input_map = {
|
||||
features: reader.streams.features,
|
||||
label: reader.streams.labels
|
||||
}
|
||||
|
||||
lr_per_sample = learning_rate_schedule(0.0005, UnitType.sample)
|
||||
# Instantiate the trainer object to drive the model training
|
||||
trainer = Trainer(classifier_output, ce, pe,
|
||||
sgd(classifier_output.parameters, lr=lr_per_sample))
|
||||
|
||||
# Get minibatches of sequences to train with and perform model training
|
||||
minibatch_size = 200
|
||||
|
||||
pp = ProgressPrinter(0)
|
||||
for i in range(255):
|
||||
mb = reader.next_minibatch(minibatch_size, input_map=input_map)
|
||||
trainer.train_minibatch(mb)
|
||||
pp.update_with_trainer(trainer, True)
|
||||
|
||||
evaluation_average = float(trainer.previous_minibatch_evaluation_average)
|
||||
loss_average = float(trainer.previous_minibatch_loss_average)
|
||||
return evaluation_average, loss_average
|
||||
|
||||
if __name__ == '__main__':
|
||||
error, _ = train_sequence_classifier()
|
||||
print(" error: %f" % error)
|
|
@ -11,7 +11,7 @@ abs_path = os.path.dirname(os.path.abspath(__file__))
|
|||
sys.path.append(os.path.join(abs_path, ".."))
|
||||
from simplenet import ffnet
|
||||
|
||||
TOLERANCE_ABSOLUTE = 1E-1
|
||||
TOLERANCE_ABSOLUTE = 5E-2
|
||||
|
||||
def test_ffnet_error(device_id):
|
||||
np.random.seed(98052)
|
||||
|
|
|
@ -0,0 +1,25 @@
|
|||
# Copyright (c) Microsoft. All rights reserved.
|
||||
|
||||
# Licensed under the MIT license. See LICENSE.md file in the project root
|
||||
# for full license information.
|
||||
# ==============================================================================
|
||||
|
||||
import os
|
||||
import sys
|
||||
import numpy as np
|
||||
|
||||
abs_path = os.path.dirname(os.path.abspath(__file__))
|
||||
sys.path.append(os.path.join(abs_path, ".."))
|
||||
from simplernn import train_sequence_classifier
|
||||
|
||||
TOLERANCE_ABSOLUTE = 5E-2
|
||||
|
||||
|
||||
def test_rnn_error(device_id):
|
||||
error, loss = train_sequence_classifier()
|
||||
|
||||
expected_error = 0.333333
|
||||
expected_loss = 1.060453
|
||||
|
||||
assert np.allclose(error, expected_error, atol=TOLERANCE_ABSOLUTE)
|
||||
assert np.allclose(loss, expected_loss, atol=TOLERANCE_ABSOLUTE)
|
Загрузка…
Ссылка в новой задаче