modified gettingstarted.rst
This commit is contained in:
Родитель
aa383f486b
Коммит
fd7bced277
|
@ -284,3 +284,34 @@ that the minibatch layout for the labels and the data with dynamic axes is compa
|
|||
For the full explanation of how ``lstm_layer()`` is defined, please see the full example in the
|
||||
Examples section.
|
||||
|
||||
How to pass Python data as train/test data
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The Python CNTK API allows to pass training / testing data either by specifing external input files or by using Python data directly to CNTK.
|
||||
This second alternative - using internal Python data - is usefull especially if you want to do some quick experimentation with small synthetic data sets.
|
||||
In what follows you will learn in what structure these data has to be provided.
|
||||
|
||||
Let us start with a scenario coming from one of our code examples (`logreg_numpy.py <https://github.com/Microsoft/CNTK/tree/master/contrib/Python/cntk/examples/LogReg/logreg_numpy.py>`_).
|
||||
In this example we want to classify a 250 dimensional feature vector into one of two classes. In this case whe have two *inputs*:
|
||||
- The features values for each training item. In the example these are 500 vectors each of dimension 250.
|
||||
- The expected class. In this example the class is encoded with a two-dimensonal vector where the element for expected class is set to 1 and the other to 0.
|
||||
|
||||
For each of these inputs we have to provide one data structure containing all training instances.
|
||||
|
||||
You might notice that this is conceptually different to the case where we provide the data from external files using the CNTKTextReader.
|
||||
In the input file for CNTKTextReader we provide data for different *inputs* of one instance on the same line, so the data from different inputs are much more interwined.
|
||||
|
||||
In Python the feature data are reprensented by a NumPy array of dimension ``number_of_instances X dimension_of_feature_space`` so in out example its a NumPy array of dimension ``500 X 250``.
|
||||
Likewise the expected output is reprensented by another NumPy array of dimension ``500 X 2``.
|
||||
|
||||
Passing sequence data from Python
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
CNTK can handle sequences with arbitrary maximal length. This feature is also called *dynamic-axis*.
|
||||
To represent an input with a dynamic-axis in Python you have to provide each sequence as a NumPy-array where the first axis has a dimension equal to the sequence length.
|
||||
The complete dataset is then just a normal one-dimensional numpy array of these sequences.
|
||||
|
||||
Take as an artifical example a sentence classification problem. Each sentence has a different number of words, i.e. it is a *sequence* of words. The individual words might each be represented by some lantent vector.
|
||||
So each sentence is represented by a NumPy array of dimension ``sequence_length X embedding_dimension``. The whole set of instances (sentences) is then represented by putting them into a one-dimensional array with the size equal to the number of instances.
|
||||
|
||||
|
Загрузка…
Ссылка в новой задаче