Enabled conversion of jupyter notebook to html using nbsphinx, integrated the conversion with the current doc build system, updated the jupyter notebooks in Tutorial and Manual folders to adhere to nbshpinx guidelines
This commit is contained in:
Родитель
1cca6e2c17
Коммит
f247772614
|
@ -2,11 +2,14 @@
|
|||
|
||||
## Documentation
|
||||
|
||||
### Add HTML version of tutorials and manuals so that they can be search-able
|
||||
### Add HTML version of tutorials and manuals so that they can be searchable
|
||||
We have added HTML versions of the tutorials and manuals with the Python documentation. This makes the [tutorial notebooks](https://www.cntk.ai/pythondocs/tutorials.html) and manuals searchable as well.
|
||||
|
||||
### Add missing evaluation documents
|
||||
|
||||
## System
|
||||
|
||||
### 16bit support for training on Volta GPU (limited functionality)
|
||||
### Update learner interface to simplify parameter setting and adding new learners (**Potential breaking change**)
|
||||
### A C#/.NET API that enables people to build and train networks.
|
||||
##### Basic training support is added to C#/.NET API. New training examples include:
|
||||
|
@ -36,12 +39,13 @@
|
|||
### Gradient as an operator (stretch goal)
|
||||
### Reduced rank for convolution in C++ to enable convolution on 1D data
|
||||
### Dilated convolution
|
||||
Add support to dilation convolution on the GPU, exposed by BrainScript, C++ and Python API. Dilation convolution effectively increase the kernel size, without actually requiring a big kernel. To use dilation convolution you need at least cuDNN 6.0.
|
||||
Add support to dilation convolution on the GPU, exposed by BrainScript, C++ and Python API. Dilation convolution effectively increase the kernel size, without actually requiring a big kernel. To use dilation convoluton you need at least cuDNN 6.0.
|
||||
### Deterministic Pooling
|
||||
Now call `cntk.debug.force_deterministic()` will make max and average pooling deterministic, this behavior depend on cuDNN version 6 or later.
|
||||
Now call `cntk.debug.force_deterministic()` will make max and average pooling determistic, this behavior depend on cuDNN version 6 or later.
|
||||
|
||||
## Performance
|
||||
### Asynchronous evaluation API (Python and C#)
|
||||
### Intel MKL update to improve inference speed on CPU by around 2x on AlexNet
|
||||
|
||||
## Keras and Tensorboard
|
||||
### Example on Keras and SKLearn multi-GPU support on CNTK
|
||||
|
|
|
@ -4,11 +4,11 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Manual: How to create user minibatch sources\n",
|
||||
"# Create user minibatch sources\n",
|
||||
"\n",
|
||||
"In order to make use of CNTK’s (distributed) training functionality, one has to provide input data as an instance of [MinibatchSource](https://cntk.ai/pythondocs/cntk.io.html#cntk.io.MinibatchSource). In CNTK, there are a variety of means to provide minibatch sources:\n",
|
||||
"\n",
|
||||
"- (**best**) convert data to the formats of built-in data readers - they support rich functionality of randomization/packing with high performance (see [Manual: How to feed data](https://github.com/Microsoft/CNTK/blob/master/Manual/Manual_How_to_feed_data.ipynb) and [cntk.io](https://cntk.ai/pythondocs/cntk.io.html))\n",
|
||||
"- (**best**) convert data to the formats of built-in data readers - they support rich functionality of randomization/packing with high performance (see [How to feed data](https://github.com/Microsoft/CNTK/blob/master/Manual/Manual_How_to_feed_data.ipynb) and [cntk.io](https://cntk.ai/pythondocs/cntk.io.html))\n",
|
||||
"- (**preferred**) if it is hard to convert the data and the data can fit in memory, please use [MinibatchSourceFromData](https://cntk.ai/pythondocs/cntk.io.html?highlight=minibatchsourcefromdata#cntk.io.MinibatchSourceFromData), \n",
|
||||
"- if the data does not fit in memory and you want a fine grained control over how minibatch is created, then implementing the abstract [UserMinibatchSource](https://cntk.ai/pythondocs/cntk.io.html#cntk.io.UserMinibatchSource) interface is the option. \n",
|
||||
"\n",
|
||||
|
@ -18,11 +18,13 @@
|
|||
"## User minibatch sources\n",
|
||||
"\n",
|
||||
"A minibatch source is responsible for providing:\n",
|
||||
"\n",
|
||||
"1. meta-information regarding the data, such as *storage format*, *data type*, *shape of elements*,\n",
|
||||
"2. batches of data, and\n",
|
||||
"3. auxiliary information for advanced features, such as checkpoint state of the current data access position so that interrupted learning processes can be restored from the data position where the processes were interrupted.\n",
|
||||
"\n",
|
||||
"Correspondingly, a minibatch source API needs to implement the following $4$ methods (see [UserMinibatchSource](https://cntk.ai/pythondocs/cntk.io.html?highlight=userminibatch#cntk.io.UserMinibatchSource) for details):\n",
|
||||
"\n",
|
||||
"1. **stream_infos()**: Returns a list of StreamInformation instances. Each piece of stream information contains the meta information regarding a stream of the data: e.g. storage format, data type, shape of elements (see [StreamInformation](https://cntk.ai/pythondocs/cntk.io.html#cntk.io.StreamInformation) for details) \n",
|
||||
"2. **next_minibatch(num_samples, number_of_workers, worker_rank, device=None)**: Returns next minibatch of data of the specified nature as specified by given parameters:\n",
|
||||
" * num_samples: the number of samples that are being requested \n",
|
||||
|
@ -113,27 +115,38 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Inherit *UserMinibatchSource* to create your user minibath class: \n",
|
||||
"### Inherit *UserMinibatchSource* to create your user minibatch class: \n",
|
||||
"\n",
|
||||
"To implement our example user minibatch source, we first prepare the data access and its meta information: \n",
|
||||
"\n",
|
||||
"1. Parse the text formatted data into an intermediate representation so that we can access the data by their sequence indices: \n",
|
||||
"```\n",
|
||||
" features = self.data[seq_idx]['features']\n",
|
||||
" labels = self.data[seq_idx]['labels']\n",
|
||||
"```\n",
|
||||
" This is done by create a private method *_prepare_data()* in the example below. We ommit the implementation detail of text format parsing here as the detail is irrelevant to the understanding of the UserMinibatchSource interface. However, the parsing mechanims should be able to keep track of where the current data access point is so that the data feeding process can be restored at any point. In the example, we are tracking the sequence index. \n",
|
||||
" This is done by create a private method ``_prepare_data()`` in the example below. We ommit the implementation detail of text format parsing here as the detail is irrelevant to the understanding of the UserMinibatchSource interface. However, the parsing mechanims should be able to keep track of where the current data access point is so that the data feeding process can be restored at any point. In the example, we are tracking the sequence index. \n",
|
||||
" \n",
|
||||
"2. Define the meta information of the data: e.g.\n",
|
||||
"```\n",
|
||||
" self.fsi = StreamInformation(\"features\", 0, 'sparse', np.float32, (self.f_dim,))\n",
|
||||
" self.lsi = StreamInformation(\"labels\", 1, 'dense', np.float32, (self.l_dim,))\n",
|
||||
"```\n",
|
||||
" The self.fsi and self.lsi define the meta information (see [StreamInformation](https://cntk.ai/pythondocs/cntk.io.html#cntk.io.StreamInformation) for definition ) regarding the features and labels respectively. For example, StreamInformation(\"features\", 0, 'sparse', np.float32, (self.f_dim,)) specifies that 1) the \"feature\" data stream is indentified by ID $0$ (it is required that every data stream is identified by a unique ID), 2) it is sparse, 3) its data type is np.float32, and 4) its dimension is (self.f_dim, ).\n",
|
||||
" The self.fsi and self.lsi define the meta information (see [StreamInformation](https://cntk.ai/pythondocs/cntk.io.html#cntk.io.StreamInformation) for definition ) regarding the features and labels respectively. For example, ``StreamInformation(\"features\", 0, 'sparse', np.float32, (self.f_dim,))`` specifies that :\n",
|
||||
" \n",
|
||||
">a) the \"feature\" data stream is indentified by ID $0$ (it is required that every data stream is identified by a unique ID), \n",
|
||||
"\n",
|
||||
">b) it is sparse, \n",
|
||||
"\n",
|
||||
">c) its data type is ``np.float32``, and \n",
|
||||
"\n",
|
||||
">d) its dimension is ``(self.f_dim, )``.\n",
|
||||
"\n",
|
||||
"3. Set the initial states of the data source. For example, set the next sequence index to the beginning:\n",
|
||||
"```\n",
|
||||
" self.next_seq_idx = 0\n",
|
||||
"```\n",
|
||||
"4. Finally, create your minibatch class based on **UserMinibatchSource** and put the above data access preparation steps in its constructor: \n",
|
||||
"\n",
|
||||
"4. Finally, create your minibatch class based on **UserMinibatchSource** and put the above data access preparation steps in its constructor: \n",
|
||||
"```python\n",
|
||||
"class MyMultiWorkerDataSource(UserMinibatchSource):\n",
|
||||
" def __init__(self, f_dim, l_dim):\n",
|
||||
|
@ -146,7 +159,7 @@
|
|||
" self.next_seq_idx = 0 \n",
|
||||
" super(MyMultiWorkerDataSource, self).__init__()\n",
|
||||
"```\n",
|
||||
"Do not forget to call the super class' constructor: **super(MyMultiWorkerDataSource, self).__init__()**."
|
||||
"Do not forget to call the super class' constructor: ``super(MyMultiWorkerDataSource, self)`` **init()** function."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -459,7 +472,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Debugging CNTK programs\n",
|
||||
"# Debug CNTK programs\n",
|
||||
"\n",
|
||||
"> \"Help! I just got this recipe from the web, I don't understand what it does, why it fails, and how to modify it for my purposes\". --- Anonymous\n",
|
||||
"\n",
|
||||
|
@ -37,7 +37,7 @@
|
|||
"- You have an NVidia GPU\n",
|
||||
"- It is listed when running nvidia-smi\n",
|
||||
"\n",
|
||||
"Then make sure CNTK sees your GPU: `all_devices()` returns all the available devices. If your GPU is not listed here, your installation is somehow broken. If CNTK lists a GPU, make sure no other CNTK process is using it (check nvidia-smi, under \"C:\\Program Files\\NVIDIA Corporation\\NVSMI\\nvidia-smi.exe\" on Windows and /usr/bin/nvidia-smi on Linux). If you have a zombie process using it you can try this \n",
|
||||
"Then make sure CNTK sees your GPU: `all_devices()` returns all the available devices. If your GPU is not listed here, your installation is somehow broken. If CNTK lists a GPU, make sure no other CNTK process is using it (check nvidia-smi, under ``C:\\Program Files\\NVIDIA Corporation\\NVSMI\\nvidia-smi.exe`` on Windows and ``/usr/bin/nvidia-smi`` on Linux). If you have a zombie process using it you can try this \n",
|
||||
"\n",
|
||||
"- on Linux\n",
|
||||
" ```bash\n",
|
||||
|
@ -55,9 +55,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -77,9 +75,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -97,9 +93,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -129,9 +123,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -178,9 +170,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -241,9 +231,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -287,9 +275,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -327,15 +313,13 @@
|
|||
"\n",
|
||||
"### Model bugs\n",
|
||||
"\n",
|
||||
"We are not done with the network above. So far we have only used printing of types to guide us. But this is not always enough to debug all issues. We can get more information from a function by plotting the underlying graph. That can be done with `logging.graph.plot` and it requires to have [graphviz](www.graphviz.org) installed, and have the binaries in your PATH environment variable. Inside a notebook we can display the network inline (use the scrollbar on the bottom and/or the right to see the whole network). Notice that none of the parameters are shared between the question and the answer. A typical solution might want to share the embedding, or both the embedding and the GRU if data is limited.\n"
|
||||
"We are not done with the network above. So far we have only used printing of types to guide us. But this is not always enough to debug all issues. We can get more information from a function by plotting the underlying graph. That can be done with `logging.graph.plot` and it requires to have [graphviz](http://www.graphviz.org) installed, and have the binaries in your PATH environment variable. Inside a notebook we can display the network inline (use the scrollbar on the bottom and/or the right to see the whole network). Notice that none of the parameters are shared between the question and the answer. A typical solution might want to share the embedding, or both the embedding and the GRU if data is limited.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -847,9 +831,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -1396,9 +1378,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -1432,9 +1412,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1479,6 +1457,7 @@
|
|||
"$$\n",
|
||||
"\\textrm{softmax}(z) = \\left(\\begin{array}{c} \\frac{\\exp(z_1)}{\\sum_j \\exp(z_j)}\\\\ \\frac{\\exp(z_2)}{\\sum_j \\exp(z_j)}\\\\ \\vdots \\\\ \\frac{\\exp(z_n)}{\\sum_j \\exp(z_j)} \\end{array}\\right)\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"and we only have one output! So the softmax will compute the exponential of that output and then **divide it by itself** giving us 1. One solution here is to have two outputs, one for each class. This is different from how binary classification is typically done where there's a single output representing the probability of the positive class. This latter approach can be implemented by using a sigmoid non-linearity. Therefore either of the following will work:\n",
|
||||
"```python\n",
|
||||
"cl.Dense(1, activation=C.sigmoid)\n",
|
||||
|
@ -1503,9 +1482,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -1540,9 +1517,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -1573,9 +1548,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -1596,9 +1569,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1910,7 +1881,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -1937,7 +1908,7 @@
|
|||
"metadata": {
|
||||
"anaconda-cloud": {},
|
||||
"kernelspec": {
|
||||
"display_name": "Python [default]",
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Manual: How to feed data\n",
|
||||
"# Read and feed data to CNTK Trainer\n",
|
||||
"\n",
|
||||
"Feeding data is an integral part of training a deep neural network. While expressiveness and succinct model representation is one of the key aspects of CNTK, efficient and flexible data reading is also made available to the users. Key to deep learning is the ability to provide randomly sampled training data to the CNTK model trainers. These small sampled data sets are called mini-batches. In this manual, we show how minibatch samples can be read from data sources and passed on to trainer objects. The built-in readers follow [convention over configuration principle](https://en.wikipedia.org/wiki/Convention_over_configuration) and greatly simplify training procedure.\n",
|
||||
"\n",
|
||||
|
@ -734,7 +734,7 @@
|
|||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"# Composite reader\n",
|
||||
"## Composite reader\n",
|
||||
"\n",
|
||||
"Finally, there are scenarios where one may need to combine readers to make composite reader. One such example is in detection of regions in an image. In this case there are images and region-of-interest (ROI's) in the image which are provided in a text format. In such a situation one need to combine a ImageDeserializer with a CTFDeserializer. Here is some sample pseudo code for reference. Details of such an example can be found in the [A2_RunWithPyModel.py](https://github.com/Microsoft/CNTK/blob/master/Examples/Image/Detection/FastRCNN/A2_RunWithPyModel.py).\n",
|
||||
"\n",
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Manual: How to train model using declarative and imperative API\n",
|
||||
"# Train model using declarative and imperative API\n",
|
||||
" \n",
|
||||
"CNTK gives the user several ways how her model can be trained:\n",
|
||||
"* High level declarative style API using [Function.train](https://www.cntk.ai/pythondocs/cntk.ops.functions.html#cntk.ops.functions.Function.train) method (or training_session). Given a criterion function, the user can simply call the train method, providing configuration parameters for different aspects of the training, such as data sources, checkpointing, cross validation and progress printing. The corresponding [test](https://www.cntk.ai/pythondocs/cntk.ops.functions.html#cntk.ops.functions.Function.test) method can be used for evaluation. This API simplifies implementation of routine training tasks and eliminates boilerplate code.\n",
|
||||
|
@ -67,9 +67,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -188,9 +186,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -285,9 +281,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 32,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -391,7 +385,6 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
|
@ -502,7 +495,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.4.4"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# How to use learners\n",
|
||||
"# Use CNTK learners\n",
|
||||
"\n",
|
||||
"In CNTK, learners are implementations of gradient-based optimization algorithms. CNTK automatically computes the gradient of your criterion/loss with respect to each learnable parameter but how this gradient is combined with the current parameter value to provide a new parameter value is left to the learner. \n",
|
||||
"\n",
|
||||
|
@ -20,7 +20,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -58,9 +58,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -108,9 +106,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -156,7 +152,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -209,9 +205,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -247,9 +241,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -294,9 +286,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -356,9 +346,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -405,9 +393,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -500,7 +486,7 @@
|
|||
"metadata": {
|
||||
"anaconda-cloud": {},
|
||||
"kernelspec": {
|
||||
"display_name": "Python [default]",
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Manual: How to write a custom deserializer\n",
|
||||
"# Write custom deserializer\n",
|
||||
" \n",
|
||||
"In this manual we describe how to load data using CNTK custom deserializers. CNTK also provides other means for loading data (i.e. built-in deserializers, user defined minibatch sources or feeding NumPy data explicitly), for more details please have a look at the [How to feed data](Manual_How_to_feed_data.ipynb) manual.\n",
|
||||
"\n",
|
||||
|
@ -431,21 +431,21 @@
|
|||
"metadata": {
|
||||
"anaconda-cloud": {},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 2",
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python2"
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.11"
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -13,6 +13,7 @@ dependencies:
|
|||
- numpy=1.11.2=py27_0
|
||||
- pandas=0.19.1=np111py27_0
|
||||
- pandas-datareader=0.2.1=py27_0
|
||||
- pandoc=1.15.0.6
|
||||
- pillow=3.4.2=py27_0
|
||||
- pip=8.1.2=py27_0
|
||||
- python=2.7.11=5
|
||||
|
@ -32,5 +33,6 @@ dependencies:
|
|||
- pytest==3.0.3
|
||||
- sphinx-rtd-theme==0.2.4
|
||||
- sphinx==1.5.4
|
||||
- nbsphinx==0.2.14
|
||||
- twine==1.8.1
|
||||
- protobuf==3.2.0
|
||||
|
|
|
@ -12,6 +12,7 @@ dependencies:
|
|||
- opencv=3.1.0=np111py34_1
|
||||
- pandas=0.19.1=np111py34_0
|
||||
- pandas-datareader=0.2.1=py34_0
|
||||
- pandoc=1.15.0.6
|
||||
- pillow=3.4.2=py34_0
|
||||
- pip=8.1.2=py34_0
|
||||
- python=3.4.4=5
|
||||
|
@ -34,6 +35,7 @@ dependencies:
|
|||
- pytest==3.0.3
|
||||
- sphinx-rtd-theme==0.2.4
|
||||
- sphinx==1.5.4
|
||||
- nbsphinx==0.2.14
|
||||
- twine==1.8.1
|
||||
- protobuf==3.2.0
|
||||
|
|
@ -11,6 +11,7 @@ dependencies:
|
|||
- numpy=1.11.2=py35_0
|
||||
- pandas=0.19.1=np111py35_0
|
||||
- pandas-datareader=0.2.1=py35_0
|
||||
- pandoc=1.15.0.6
|
||||
- pillow=3.4.2=py35_0
|
||||
- pip=8.1.2=py35_0
|
||||
- python=3.5.2=0
|
||||
|
@ -30,5 +31,6 @@ dependencies:
|
|||
- pytest==3.0.3
|
||||
- sphinx-rtd-theme==0.2.4
|
||||
- sphinx==1.5.4
|
||||
- nbsphinx==0.2.14
|
||||
- twine==1.8.1
|
||||
- protobuf==3.2.0
|
||||
|
|
|
@ -11,6 +11,7 @@ dependencies:
|
|||
- numpy=1.11.2=py36_0
|
||||
- pandas=0.19.1=np111py36_0
|
||||
- pandas-datareader=0.2.1=py36_0
|
||||
- pandoc=1.15.0.6
|
||||
- pillow=3.4.2=py36_0
|
||||
- pip=9.0.1=py36_1
|
||||
- python=3.6.0=0
|
||||
|
@ -30,5 +31,6 @@ dependencies:
|
|||
- pytest==3.0.3
|
||||
- sphinx-rtd-theme==0.2.4
|
||||
- sphinx==1.5.4
|
||||
- nbsphinx==0.2.14
|
||||
- twine==1.8.1
|
||||
- protobuf==3.2.0
|
||||
|
|
|
@ -13,6 +13,7 @@ dependencies:
|
|||
- numpy=1.11.2=py27_0
|
||||
- pandas=0.19.1=np111py27_0
|
||||
- pandas-datareader=0.2.1=py27_0
|
||||
- pandoc=1.19.2.1
|
||||
- pillow=3.4.2=py27_0
|
||||
- pip=8.1.2=py27_0
|
||||
- python=2.7.11=5
|
||||
|
@ -32,5 +33,6 @@ dependencies:
|
|||
- pytest==3.0.3
|
||||
- sphinx-rtd-theme==0.2.4
|
||||
- sphinx==1.5.4
|
||||
- nbsphinx==0.2.14
|
||||
- twine==1.8.1
|
||||
- protobuf==3.2.0
|
||||
|
|
|
@ -11,6 +11,7 @@ dependencies:
|
|||
- numpy=1.11.2=py34_0
|
||||
- pandas=0.19.1=np111py34_0
|
||||
- pandas-datareader=0.2.1=py34_0
|
||||
- pandoc=1.19.2.1
|
||||
- pillow=3.4.2=py34_0
|
||||
- pip=8.1.2=py34_0
|
||||
- python=3.4.4=5
|
||||
|
@ -30,5 +31,6 @@ dependencies:
|
|||
- pytest==3.0.3
|
||||
- sphinx-rtd-theme==0.2.4
|
||||
- sphinx==1.5.4
|
||||
- nbsphinx==0.2.14
|
||||
- twine==1.8.1
|
||||
- protobuf==3.2.0
|
||||
|
|
|
@ -11,6 +11,7 @@ dependencies:
|
|||
- numpy=1.11.2=py35_0
|
||||
- pandas=0.19.1=np111py35_0
|
||||
- pandas-datareader=0.2.1=py35_0
|
||||
- pandoc=1.19.2.1
|
||||
- pillow=3.2.0=py35_1
|
||||
- pip=8.1.2=py35_0
|
||||
- python=3.5.2=0
|
||||
|
@ -32,6 +33,7 @@ dependencies:
|
|||
- pytest==3.0.3
|
||||
- sphinx-rtd-theme==0.2.4
|
||||
- sphinx==1.5.4
|
||||
- nbsphinx==0.2.14
|
||||
- twine==1.8.1
|
||||
- protobuf==3.2.0
|
||||
- gym==0.5.2
|
||||
|
|
|
@ -11,6 +11,7 @@ dependencies:
|
|||
- numpy=1.11.2=py36_0
|
||||
- pandas=0.19.1=np111py36_0
|
||||
- pandas-datareader=0.2.1=py36_0
|
||||
- pandoc=1.19.2.1
|
||||
- pillow=3.4.2=py36_0
|
||||
- pip=9.0.1=py36_1
|
||||
- python=3.6.0=0
|
||||
|
@ -30,5 +31,6 @@ dependencies:
|
|||
- pytest==3.0.3
|
||||
- sphinx-rtd-theme==0.2.4
|
||||
- sphinx==1.5.4
|
||||
- nbsphinx==0.2.14
|
||||
- twine==1.8.1
|
||||
- protobuf==3.2.0
|
||||
|
|
|
@ -14,6 +14,11 @@ MODULE_DIR="$(python -c "import cntk, os, sys; sys.stdout.write(os.path.dirname(
|
|||
|
||||
cd ../../../../bindings/python/doc
|
||||
|
||||
# Copy the notebooks to the local directory
|
||||
cp ../../../Manual/Manual_* .
|
||||
cp ../../../Tutorials/CNTK_* .
|
||||
rm CNTK_5*.ipynb
|
||||
|
||||
if [ "$OS" == "Windows_NT" ]; then
|
||||
NORM_PATH="cygpath -aw --file -"
|
||||
# In our test env, DOS find.exe may come first in path, make sure we don't use it:
|
||||
|
@ -28,6 +33,10 @@ sphinx-apidoc "$MODULE_DIR" --module-first --separate --no-toc --output-dir=. --
|
|||
|
||||
sphinx-build -b html -d _build/doctrees -W -j $(nproc) . _build/html
|
||||
|
||||
#Remove the ipynb's
|
||||
rm Manual_*.ipynb
|
||||
rm CNTK_*.ipynb
|
||||
|
||||
# Note: there's currently no way to disable the below checks on a per-instance basis.
|
||||
# If they yield false positives, they need to be refined, or disabled if this
|
||||
# is not possible.
|
||||
|
@ -36,16 +45,18 @@ ERROR=0
|
|||
|
||||
# (Using "grep -a" below because of Windows test environment setup.)
|
||||
|
||||
if grep -aE " at 0x[0-9a-fA-F]{8}" _build/html/*.html; then
|
||||
echo Object addresses leaked into documentation, please fix \(e.g., by
|
||||
echo repeating the signature at the beginning of the docstring\).
|
||||
((ERROR++))
|
||||
fi
|
||||
# TODO: Re-enable the test, after fixing Matplotlib addresses in ipynb htmls
|
||||
#if grep -aE " at 0x[0-9a-fA-F]{8}" _build/html/*.html; then
|
||||
# echo Object addresses leaked into documentation, please fix \(e.g., by
|
||||
# echo repeating the signature at the beginning of the docstring\).
|
||||
# ((ERROR++))
|
||||
#fi
|
||||
|
||||
if grep -a '``' _build/html/*.html; then
|
||||
echo Double back-ticks leaked into the documentation. Please check.
|
||||
((ERROR++))
|
||||
fi
|
||||
# TODO: Re-enable the test, after fixing backticks in the ipynb files
|
||||
#if grep -a '``' _build/html/*.html; then
|
||||
# echo Double back-ticks leaked into the documentation. Please check.
|
||||
# ((ERROR++))
|
||||
#fi
|
||||
|
||||
if grep -a ':[a-z][a-z]*:' _build/html/*.html; then
|
||||
echo Unrendered content leaked into the documentation. Please check.
|
||||
|
|
|
@ -4,9 +4,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -16,9 +14,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"# CNTK 101: Logistic Regression and ML Primer\n",
|
||||
|
@ -36,11 +32,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -63,10 +55,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Goal**:\n",
|
||||
"Our goal is to learn a classifier that can automatically label any patient into either the benign or malignant categories given two features (age and tumor size). In this tutorial, we will create a linear classifier, a fundamental building-block in deep networks."
|
||||
|
@ -75,11 +64,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -102,10 +87,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the figure above, the green line represents the model learned from the data and separates the blue dots from the red dots. In this tutorial, we will walk you through the steps to learn the green line. Note: this classifier does make mistakes, where a couple of blue dots are on the wrong side of the green line. However, there are ways to fix this and we will look into some of the techniques in later tutorials. \n",
|
||||
"\n",
|
||||
|
@ -127,11 +109,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -154,10 +132,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the above figure, contributions from different input features are linearly weighted and aggregated. The resulting sum is mapped to a (0, 1) range via a [sigmoid]( https://en.wikipedia.org/wiki/Sigmoid_function) function. For classifiers with more than two output labels, one can use a [softmax](https://en.wikipedia.org/wiki/Softmax_function) function."
|
||||
]
|
||||
|
@ -166,9 +141,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -186,10 +159,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data Generation\n",
|
||||
"Let us generate some synthetic data emulating the cancer example using the `numpy` library. We have two input features (represented in two-dimensions) and two output classes (benign/blue or malignant/red). \n",
|
||||
|
@ -201,9 +171,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -214,10 +182,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Input and Labels\n",
|
||||
"\n",
|
||||
|
@ -228,9 +193,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -260,9 +223,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -275,10 +236,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us visualize the input data.\n",
|
||||
"\n",
|
||||
|
@ -288,11 +246,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -321,12 +275,9 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Model Creation\n",
|
||||
"## Model Creation\n",
|
||||
"\n",
|
||||
"A logistic regression (a.k.a. LR) network is a simple building block, but has powered many ML \n",
|
||||
"applications in the past decade. LR is a simple linear model that takes as input a vector of numbers describing the properties of what we are classifying (also known as a feature vector, $\\bf{x}$, the blue nodes in the figure below) and emits the *evidence* ($z$) (output of the green node, also known as \"activation\"). Each feature in the input layer is connected to an output node by a corresponding weight $w$ (indicated by the black lines of varying thickness). "
|
||||
|
@ -335,11 +286,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -362,10 +309,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The first step is to compute the evidence for an observation. \n",
|
||||
"\n",
|
||||
|
@ -384,9 +328,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -395,10 +337,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Network setup\n",
|
||||
"\n",
|
||||
|
@ -413,9 +352,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -435,10 +372,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`z` will be used to represent the output of the network."
|
||||
]
|
||||
|
@ -447,9 +381,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -459,10 +391,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Learning model parameters\n",
|
||||
"\n",
|
||||
|
@ -475,10 +404,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Training\n",
|
||||
"The output of the `softmax` is the probabilities of an observation belonging each of the respective classes. For training the classifier, we need to determine what behavior the model needs to mimic. In other words, we want the generated probabilities to be as close as possible to the observed labels. We can accomplish this by minimizing the difference between our output and the ground-truth labels. This difference is calculated by the *cost* or *loss* function.\n",
|
||||
|
@ -494,9 +420,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -506,10 +430,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Evaluation\n",
|
||||
"\n",
|
||||
|
@ -520,9 +441,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -531,10 +450,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Configure training\n",
|
||||
"\n",
|
||||
|
@ -552,9 +468,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -567,10 +481,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, let us create some helper functions that will be needed to visualize different functions associated with training. Note: these convenience functions are for understanding what goes on under the hood."
|
||||
]
|
||||
|
@ -579,9 +490,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -608,10 +517,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run the trainer\n",
|
||||
"\n",
|
||||
|
@ -626,9 +532,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -641,11 +545,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -695,11 +595,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -749,10 +645,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run evaluation / Testing \n",
|
||||
"\n",
|
||||
|
@ -766,11 +659,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -793,10 +682,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Checking prediction / evaluation \n",
|
||||
"For evaluation, we softmax the output of the network into a probability distribution over the two classes, the probability of each observation being malignant or benign. "
|
||||
|
@ -806,9 +692,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -818,10 +702,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us compare the ground-truth label with the predictions. They should be in agreement.\n",
|
||||
"\n",
|
||||
|
@ -832,11 +713,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -854,10 +731,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Visualization\n",
|
||||
"It is desirable to visualize the results. In this example, the data can be conveniently plotted using two spatial dimensions for the input (patient age on the x-axis and tumor size on the y-axis), and a color dimension for the output (red for malignant and blue for benign). For data with higher dimensions, visualization can be challenging. There are advanced dimensionality reduction techniques, such as [t-sne](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding) that allow for such visualizations."
|
||||
|
@ -866,11 +740,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -913,9 +783,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"**Exploration Suggestions** \n",
|
||||
|
@ -928,9 +796,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"### Appendix\n",
|
||||
|
@ -962,7 +828,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -4,9 +4,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -16,8 +14,6 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"nbpresent": {
|
||||
"id": "29b9bd1d-766f-4422-ad96-de0accc1ce58"
|
||||
}
|
||||
|
@ -37,11 +33,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -64,10 +56,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Goal**:\n",
|
||||
"Our goal is to learn a classifier that classifies any patient into either benign or malignant category given two features (age, tumor size). \n",
|
||||
|
@ -88,11 +77,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -115,10 +100,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"A feedforward neural network is an artificial neural network where connections between the units **do not** form a cycle.\n",
|
||||
"The feedforward neural network was the first and simplest type of artificial neural network devised. In this network, the information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network\n",
|
||||
|
@ -131,8 +113,6 @@
|
|||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"nbpresent": {
|
||||
"id": "138d1a78-02e2-4bd6-a20e-07b83f303563"
|
||||
}
|
||||
|
@ -156,10 +136,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data Generation\n",
|
||||
"This section can be *skipped* (next section titled <a href='#Model Creation'>Model Creation</a>) if you have gone through CNTK 101. \n",
|
||||
|
@ -173,9 +150,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -189,10 +164,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Input and Labels\n",
|
||||
"\n",
|
||||
|
@ -203,9 +175,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -228,9 +198,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -243,10 +211,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us visualize the input data. \n",
|
||||
"\n",
|
||||
|
@ -256,11 +221,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -289,10 +250,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model Creation\n",
|
||||
"\n",
|
||||
|
@ -302,11 +260,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -329,10 +283,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The number of green nodes (refer to picture above) in each hidden layer is set to 50 in the example and the number of hidden layers (refer to the number of layers of green nodes) is 2. Fill in the following values:\n",
|
||||
"- num_hidden_layers\n",
|
||||
|
@ -345,9 +296,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -357,10 +306,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Network input and output: \n",
|
||||
"- **input** variable (a key CNTK concept): \n",
|
||||
|
@ -374,9 +320,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -391,10 +335,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Feed forward network setup\n",
|
||||
"Let us define the feedforward network one step at a time. The first layer takes an input feature vector ($\\bf{x}$) with dimensions `input_dim`, say $m$, and emits the output a.k.a. *evidence* (first hidden layer $\\bf{z_1}$ with dimension `hidden_layer_dim`, say $n$). Each feature in the input layer is connected with a node in the output layer by the weight which is represented by a matrix $\\bf{W}$ with dimensions ($m \\times n$). The first step is to compute the evidence for the entire feature set. Note: we use **bold** notations to denote matrix / vectors: \n",
|
||||
|
@ -412,9 +353,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -429,10 +368,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The next step is to convert the *evidence* (the output of the linear layer) through a non-linear function a.k.a. *activation functions* of your choice that would squash the evidence to activations using a choice of functions ([found here](https://cntk.ai/pythondocs/cntk.layers.layers.html#cntk.layers.layers.Activation)). **Sigmoid** or **Tanh** are historically popular. We will use **sigmoid** function in this tutorial. The output of the sigmoid function often is the input to the next layer or the output of the final layer. \n",
|
||||
"\n",
|
||||
|
@ -443,9 +379,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -457,10 +391,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now that we have created one hidden layer, we need to iterate through the layers to create a fully connected classifier. Output of the first layer $\\bf{h_1}$ becomes the input to the next layer.\n",
|
||||
"\n",
|
||||
|
@ -481,9 +412,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -500,10 +429,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The network output `z` will be used to represent the output of a network across."
|
||||
]
|
||||
|
@ -512,9 +438,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -525,10 +449,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"While the aforementioned network helps us better understand how to implement a network using CNTK primitives, it is much more convenient and faster to use the [layers library](https://www.cntk.ai/pythondocs/layerref.html). It provides predefined commonly used “layers” (lego like blocks), which simplifies the design of networks that consist of standard layers layered on top of each other. For instance, ``dense_layer`` is already easily accessible through the [Dense](https://www.cntk.ai/pythondocs/layerref.html#dense) layer function to compose our deep model. We can pass the input variable (`input`) to this model to get the network output. \n",
|
||||
"\n",
|
||||
|
@ -539,9 +460,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -559,10 +478,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Learning model parameters\n",
|
||||
"\n",
|
||||
|
@ -575,10 +491,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Training\n",
|
||||
"\n",
|
||||
|
@ -596,9 +509,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -607,10 +518,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Evaluation\n",
|
||||
"\n",
|
||||
|
@ -621,9 +529,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -632,10 +538,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Configure training\n",
|
||||
"\n",
|
||||
|
@ -653,9 +556,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -668,10 +569,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First lets create some helper functions that will be needed to visualize different functions associated with training."
|
||||
]
|
||||
|
@ -680,9 +578,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -710,10 +606,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run the trainer\n",
|
||||
"\n",
|
||||
|
@ -728,9 +621,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -744,9 +635,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -771,10 +660,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us plot the errors over the different training minibatches. Note that as we iterate the training loss decreases though we do see some intermediate bumps. The bumps indicate that during that iteration the model came across observations that it predicted incorrectly. This can happen with observations that are novel during model training.\n",
|
||||
"\n",
|
||||
|
@ -786,11 +672,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -840,10 +722,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run evaluation / testing \n",
|
||||
"\n",
|
||||
|
@ -853,11 +732,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -880,20 +755,14 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note, this error is very comparable to our training error indicating that our model has good \"out of sample\" error a.k.a. generalization error. This implies that our model can very effectively deal with previously unseen observations (during the training process). This is key to avoid the phenomenon of overfitting."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We have so far been dealing with aggregate measures of error. Lets now get the probabilities associated with individual data points. For each observation, the `eval` function returns the probability distribution across all the classes. If you used the default parameters in this tutorial, then it would be a vector of 2 elements per observation. First let us route the network output through a softmax function.\n",
|
||||
"\n",
|
||||
|
@ -905,11 +774,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -934,9 +799,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -945,10 +808,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us test on previously unseen data."
|
||||
]
|
||||
|
@ -957,9 +817,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -969,11 +827,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -992,9 +846,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"**Exploration Suggestion**\n",
|
||||
|
@ -1007,26 +859,13 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"**Code link**\n",
|
||||
"\n",
|
||||
"If you want to try running the tutorial from python command prompt. Please run the [FeedForwardNet.py](https://github.com/Microsoft/CNTK/blob/release/2.1/Tutorials/NumpyInterop/FeedForwardNet.py) example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
@ -1046,7 +885,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -3,12 +3,10 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"# CNTK 103 Part A: MNIST Data Loader\n",
|
||||
"# CNTK 103: Part A - MNIST Data Loader\n",
|
||||
"\n",
|
||||
"This tutorial is targeted to individuals who are new to CNTK and to machine learning. We assume you have completed or are familiar with CNTK 101 and 102. In this tutorial, we will download and pre-process the MNIST digit images to be used for building different models to recognize handwritten digits. We will extend CNTK 101 and 102 to be applied to this data set. Additionally, we will introduce a convolutional network to achieve superior performance. This is the first example, where we will train and evaluate a neural network based model on real world data. \n",
|
||||
"\n",
|
||||
|
@ -21,9 +19,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -49,10 +45,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data download\n",
|
||||
"\n",
|
||||
|
@ -63,9 +56,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -127,10 +118,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Download the data\n",
|
||||
"\n",
|
||||
|
@ -140,11 +128,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -183,10 +167,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Visualize the data"
|
||||
]
|
||||
|
@ -194,11 +175,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -228,10 +205,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Save the images\n",
|
||||
"\n",
|
||||
|
@ -248,9 +222,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -277,11 +249,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -312,10 +280,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Suggested Explorations**\n",
|
||||
"\n",
|
||||
|
@ -346,7 +311,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -4,9 +4,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -16,8 +14,6 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"nbpresent": {
|
||||
"id": "29b9bd1d-766f-4422-ad96-de0accc1ce58"
|
||||
}
|
||||
|
@ -38,11 +34,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -65,10 +57,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Goal**:\n",
|
||||
"Our goal is to train a classifier that will identify the digits in the MNIST dataset. Additionally, we aspire to achieve lower error rate with Multi-layer perceptron compared to Multi-class logistic regression. \n",
|
||||
|
@ -86,8 +75,6 @@
|
|||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"nbpresent": {
|
||||
"id": "138d1a78-02e2-4bd6-a20e-07b83f303563"
|
||||
}
|
||||
|
@ -111,10 +98,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data reading\n",
|
||||
"\n",
|
||||
|
@ -125,9 +109,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -138,10 +120,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this tutorial we are using the MNIST data you have downloaded using CNTK_103A_MNIST_DataLoader notebook. The dataset has 60,000 training images and 10,000 test images with each image being 28 x 28 pixels. Thus the number of features is equal to 784 (= 28 x 28 pixels), 1 per pixel. The variable `num_output_classes` is set to 10 corresponding to the number of digits (0-9) in the dataset.\n",
|
||||
"\n",
|
||||
|
@ -157,9 +136,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -174,11 +151,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -206,10 +179,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model Creation\n",
|
||||
"\n",
|
||||
|
@ -220,10 +190,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you are not familiar with the terms *hidden layer* and *number of hidden layers*, please refer back to CNTK 102 tutorial.\n",
|
||||
"\n",
|
||||
|
@ -244,9 +211,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -256,10 +221,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Network input and output: \n",
|
||||
"- **input** variable (a key CNTK concept): \n",
|
||||
|
@ -273,9 +235,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -285,10 +245,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Multi-layer Perceptron setup\n",
|
||||
"\n",
|
||||
|
@ -299,9 +256,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -318,10 +273,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`z` will be used to represent the output of a network.\n",
|
||||
"\n",
|
||||
|
@ -339,9 +291,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -351,10 +301,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Learning model parameters\n",
|
||||
"\n",
|
||||
|
@ -363,10 +310,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Training\n",
|
||||
"\n",
|
||||
|
@ -377,9 +321,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -388,10 +330,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Evaluation\n",
|
||||
"\n",
|
||||
|
@ -402,9 +341,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -413,10 +350,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Configure training\n",
|
||||
"\n",
|
||||
|
@ -434,9 +368,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -449,10 +381,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First let us create some helper functions that will be needed to visualize different functions associated with training."
|
||||
]
|
||||
|
@ -461,9 +390,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -491,10 +418,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run the trainer\n",
|
||||
"\n",
|
||||
|
@ -507,9 +431,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -523,11 +445,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -586,10 +504,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us plot the errors over the different training minibatches. Note that as we iterate the training loss decreases though we do see some intermediate bumps. \n",
|
||||
"\n",
|
||||
|
@ -599,11 +514,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -653,10 +564,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run evaluation / testing \n",
|
||||
"\n",
|
||||
|
@ -666,11 +574,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -713,10 +617,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note, this error is very comparable to our training error indicating that our model has good \"out of sample\" error a.k.a. generalization error. This implies that our model can very effectively deal with previously unseen observations (during the training process). This is key to avoid the phenomenon of overfitting.\n",
|
||||
"\n",
|
||||
|
@ -725,10 +626,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We have so far been dealing with aggregate measures of error. Let us now get the probabilities associated with individual data points. For each observation, the `eval` function returns the probability distribution across all the classes. The classifier is trained to recognize digits, hence has 10 classes. First let us route the network output through a `softmax` function. This maps the aggregated activations across the network to probabilities across the 10 classes."
|
||||
]
|
||||
|
@ -737,9 +635,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -748,10 +644,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us a small minibatch sample from the test data."
|
||||
]
|
||||
|
@ -760,9 +653,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -783,9 +674,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -797,11 +686,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -819,10 +704,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us visualize some of the results"
|
||||
]
|
||||
|
@ -830,11 +712,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -867,9 +745,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"**Exploration Suggestion**\n",
|
||||
|
@ -881,26 +757,13 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"**Code link**\n",
|
||||
"\n",
|
||||
"If you want to try running the tutorial from Python command prompt please run the [SimpleMNIST.py](https://github.com/Microsoft/CNTK/tree/release/2.1/Examples/Image/Classification/MLP/Python) example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
@ -920,7 +783,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -4,9 +4,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -16,8 +14,6 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"nbpresent": {
|
||||
"id": "29b9bd1d-766f-4422-ad96-de0accc1ce58"
|
||||
}
|
||||
|
@ -46,11 +42,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -73,10 +65,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Goal**:\n",
|
||||
"Our goal is to train a classifier that will identify the digits in the MNIST dataset. \n",
|
||||
|
@ -95,8 +84,6 @@
|
|||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"nbpresent": {
|
||||
"id": "138d1a78-02e2-4bd6-a20e-07b83f303563"
|
||||
}
|
||||
|
@ -121,10 +108,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data reading\n",
|
||||
"In this section, we will read the data generated in CNTK 103 Part A (MNIST Data Loader).\n",
|
||||
|
@ -148,9 +132,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -162,10 +144,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Data Format** The data is stored on our local machine in the CNTK CTF format. The CTF format is a simple text format that contains a set of samples with each sample containing a set of named fields and their data. For our MNIST data, each sample contains 2 fields: labels and feature, formatted as:\n",
|
||||
"\n",
|
||||
|
@ -183,9 +162,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -203,11 +180,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -240,10 +213,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## CNN Model Creation\n",
|
||||
"\n",
|
||||
|
@ -265,11 +235,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -291,10 +257,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Convolution layers incorporate following key features:\n",
|
||||
"\n",
|
||||
|
@ -314,10 +277,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Strides and Pad parameters\n",
|
||||
"\n",
|
||||
|
@ -330,11 +290,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -387,10 +343,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Building our CNN models\n",
|
||||
"\n",
|
||||
|
@ -401,9 +354,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -413,10 +364,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The first model we build is a simple convolution only network. Here we have two convolutional layers. Since, our task is to detect the 10 digits in the MNIST database, the output of the network should be a vector of length 10, 1 element corresponding to each digit. This is achieved by projecting the output of the last convolutional layer using a dense layer with the output being `num_output_classes`. We have seen this before with Logistic Regression and MLP where features were mapped to the number of classes in the final layer. Also, note that since we will be using the `softmax` operation that is combined with the `cross entropy` loss function during training (see a few cells below), the final dense layer has no activation function associated with it.\n",
|
||||
"\n",
|
||||
|
@ -429,9 +377,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -454,10 +400,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us create an instance of the model and inspect the different components of the model. `z` will be used to represent the output of a network. In this model, we use the `relu` activation function. Note: using the `C.layers.default_options` is an elegant way to write concise models. This is key to minimizing modeling errors, saving precious debugging time."
|
||||
]
|
||||
|
@ -465,11 +408,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -491,10 +430,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Understanding number of model parameters to be estimated is key to deep learning since there is a direct dependency on the amount of data one needs to have. You need more data for a model that has larger number of parameters to prevent overfitting. In other words, with a fixed amount of data, one has to constrain the number of parameters. There is no golden rule between the amount of data one needs for a model. However, there are ways one can boost performance of model training with [data augmentation](https://deeplearningmania.quora.com/The-Power-of-Data-Augmentation-2). "
|
||||
]
|
||||
|
@ -502,11 +438,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -523,10 +455,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Understanding Parameters**:\n",
|
||||
"\n",
|
||||
|
@ -547,10 +476,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Knowledge check**: Does the dense layer shape align with the task (MNIST digit classification)?\n",
|
||||
"\n",
|
||||
|
@ -564,10 +490,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Learning model parameters\n",
|
||||
"\n",
|
||||
|
@ -576,10 +499,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Training\n",
|
||||
"\n",
|
||||
|
@ -590,9 +510,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -604,10 +522,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next we will need a helper function to perform the model training. First let us create additional helper functions that will be needed to visualize different functions associated with training."
|
||||
]
|
||||
|
@ -616,9 +531,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -646,10 +559,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Configure training\n",
|
||||
"\n",
|
||||
|
@ -660,9 +570,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -736,10 +644,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run the trainer and test model\n",
|
||||
"\n",
|
||||
|
@ -749,11 +654,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -796,10 +697,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note, the average test error is very comparable to our training error indicating that our model has good \"out of sample\" error a.k.a. [generalization error](https://en.wikipedia.org/wiki/Generalization_error). This implies that our model can very effectively deal with previously unseen observations (during the training process). This is key to avoid [overfitting](https://en.wikipedia.org/wiki/Overfitting).\n",
|
||||
"\n",
|
||||
|
@ -809,11 +707,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -830,10 +724,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run evaluation / prediction\n",
|
||||
"We have so far been dealing with aggregate measures of error. Let us now get the probabilities associated with individual data points. For each observation, the `eval` function returns the probability distribution across all the classes. The classifier is trained to recognize digits, hence has 10 classes. First let us route the network output through a `softmax` function. This maps the aggregated activations across the network to probabilities across the 10 classes."
|
||||
|
@ -843,9 +734,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -854,10 +743,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us a small minibatch sample from the test data."
|
||||
]
|
||||
|
@ -866,9 +752,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -893,9 +777,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -907,11 +789,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -929,10 +807,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us visualize some of the results"
|
||||
]
|
||||
|
@ -940,11 +815,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -976,10 +847,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Pooling Layer\n",
|
||||
"\n",
|
||||
|
@ -993,10 +861,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Another alternative is average pooling, which emits that average value instead of the maximum value. The two different pooling opearations are summarized in the animation below."
|
||||
]
|
||||
|
@ -1004,11 +869,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1061,12 +922,9 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Typical convolution network\n",
|
||||
"## Typical convolution network\n",
|
||||
"\n",
|
||||
"![](http://www.cntk.ai/jup/conv103d_mnist-conv-mp.png)\n",
|
||||
"\n",
|
||||
|
@ -1077,10 +935,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Task: Create a network with MaxPooling\n",
|
||||
"\n",
|
||||
|
@ -1097,9 +952,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -1125,9 +978,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"**Quiz**: How many parameters do we have in the model with MaxPooling and Convolution? Which of the two models produces lower error rate?\n",
|
||||
|
@ -1140,22 +991,15 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Solution"
|
||||
"## Solution"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1208,17 +1052,6 @@
|
|||
" \n",
|
||||
"do_train_test()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
@ -1238,7 +1071,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -2,12 +2,9 @@
|
|||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tutorial 104: Time Series Basics with Pandas and Finance Data\n",
|
||||
"# CNTK 104: Time Series Basics with Pandas and Finance Data\n",
|
||||
"\n",
|
||||
"Contributed by: Avi Thaker\n",
|
||||
"November 20, 2016\n",
|
||||
|
@ -23,9 +20,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -46,10 +41,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Read data\n",
|
||||
"We first retrieve stock data using the method `get_stock_data`. This method downloads stock data on a daily timescale from Google Finance (can be modified to get data from Yahoo Finance and many other sources). [Pandas datareader]( http://pandas-datareader.readthedocs.io/en/latest/remote_data.html) shows many use cases for the data reader."
|
||||
|
@ -59,9 +51,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -111,11 +101,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -176,10 +162,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Build features\n",
|
||||
"\n",
|
||||
|
@ -203,11 +186,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -496,10 +475,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**What are trying to predict**\n",
|
||||
"\n",
|
||||
|
@ -510,9 +486,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -533,9 +507,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"Here we are actually building the neural network itself. We will use a simple feedforward neural network (represented as `NN` in the plots) with 10 inputs and 50 dimensions.\n",
|
||||
|
@ -546,11 +518,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -566,10 +534,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model Creation\n",
|
||||
"\n",
|
||||
|
@ -580,9 +545,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -615,9 +578,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -634,10 +595,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Training\n",
|
||||
"\n",
|
||||
|
@ -650,11 +608,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -696,11 +650,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -750,11 +700,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -798,10 +744,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice the trend for the label prediction error is still close to 50%. Remember that this is time variant, therefore it is expected that the system will have some noise as it trains through time. It should be noted; the model is still learning the market. Additionally, since this time series data is so noisy, having an error rate below 50% is good (many trading firms have win-rates of near 50% and have made money nearly every day [VIRTU](https://en.wikipedia.org/wiki/Virtu_Financial#Trading_activity)). However note they are high frequency trading firm and can leverage themselves up with low winrate strategies (51%). Trying to classify and trade every single day is expensive from transaction fees perspective. Therefore, one approach would be to trade when we think we are more likely to win?\n",
|
||||
"\n",
|
||||
|
@ -811,11 +754,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -838,10 +777,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here we see that we have an error rate near 50%. At first glance this may appear to not have learned the network, but let us examine further and see if we have some predictive power."
|
||||
]
|
||||
|
@ -850,9 +786,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -867,10 +801,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Evaluation\n",
|
||||
"Here we take the output of our test set and compute the probabilities from the softmax function. Since we have probabilities we want to trade when there is a \"higher\" chance that we will be right, instead of just a >50% chance that the market will go in one direction. The goal is to find a signal, instead of trying to classify the market. Since the market is so noisy we want to only trade when we have an \"edge\" on the market. Moreover, trading frequently has higher fees (you have to pay each time you trade).\n",
|
||||
|
@ -893,9 +824,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -932,11 +861,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -963,11 +888,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1033,10 +954,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This plot shows the % returns when we trade using SPY and NN based models only when we are > 55% sure of the predicted directionality.\n",
|
||||
"\n",
|
||||
|
@ -1056,11 +974,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1115,9 +1029,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"The plot above shows the % returns when we trade every day using SPY and NN based models as compared to a confidence based trading show in previous plot. With frequent trading the volatility is higher and transaction fees (not accounted in this plot) will greatly eat into any profits.\n",
|
||||
|
@ -1131,11 +1043,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1153,9 +1061,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"## Appendix\n",
|
||||
|
@ -1183,7 +1089,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -4,9 +4,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -15,10 +13,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK 105: Basic autoencoder (AE) with MNIST data\n",
|
||||
"\n",
|
||||
|
@ -41,11 +36,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -68,10 +59,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this tutorial, we will use the [MNIST hand-written digits data](https://en.wikipedia.org/wiki/MNIST_database) to show how images can be encoded and decoded (restored) using feed-forward networks. We will visualize the original and the restored images. We illustrate feed forward network based on two autoencoders: simple and deep autoencoder. More advanced autoencoders will be covered in future 200 series tutorials."
|
||||
]
|
||||
|
@ -80,9 +68,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -104,10 +90,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are two run modes:\n",
|
||||
"- *Fast mode*: `isFast` is set to `True`. This is the default mode for the notebooks, which means we train for fewer iterations or train / test on limited data. This ensures functional correctness of the notebook though the models produced are far from what a completed training would produce.\n",
|
||||
|
@ -119,9 +102,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -130,10 +111,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data reading\n",
|
||||
"\n",
|
||||
|
@ -153,9 +131,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -170,11 +146,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -203,10 +175,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model Creation (Simple AE)\n",
|
||||
"\n",
|
||||
|
@ -216,11 +185,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -243,10 +208,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The input data is a set of hand written digits images each of 28 x 28 pixels. In this tutorial, we will consider each image as a linear array of 784 pixel values. These pixels are considered as an input having 784 dimensions, one per pixel. Since the goal of the autoencoder is to compress the data and reconstruct the original image, the output dimension is same as the input dimension. We will compress the input to mere 32 dimensions (referred to as the `encoding_dim`). Additionally, since the maximum input value is 255, we normalize the input between 0 and 1. "
|
||||
]
|
||||
|
@ -255,9 +217,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -276,10 +236,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Train and test the model\n",
|
||||
"\n",
|
||||
|
@ -304,9 +261,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -425,10 +380,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us train the simple autoencoder. We create a training and a test reader"
|
||||
]
|
||||
|
@ -436,11 +388,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -478,10 +426,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Visualize simple AE results"
|
||||
]
|
||||
|
@ -489,11 +434,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -543,10 +484,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us plot the original and the decoded image. They should look visually similar."
|
||||
]
|
||||
|
@ -555,9 +493,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -577,11 +513,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -608,9 +540,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"## Model Creation (Deep AE)\n",
|
||||
|
@ -621,11 +551,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -648,10 +574,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The encoding dimensions are 128, 64 and 32 while the decoding dimensions are symmetrically opposite 64, 128 and 784. This increases the number of parameters used to model the transformation and achieves lower error rates at the cost of longer training duration and memory footprint. If we train this deep encoder for larger number iterations by turning the `isFast` flag to be `False`, we get a lower error and the reconstructed images are also marginally better. "
|
||||
]
|
||||
|
@ -660,9 +583,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -693,11 +614,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -736,10 +653,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Visualize deep AE results"
|
||||
]
|
||||
|
@ -747,11 +661,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -786,10 +696,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us plot the original and the decoded image with the deep autoencoder. They should look visually similar."
|
||||
]
|
||||
|
@ -797,11 +704,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -827,10 +730,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We have shown how to encode and decode an input. In this section we will explore how we can compare one to another and also show how to extract an encoded input for a given input. For visualizing high dimension data in 2D, [t-SNE](http://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) is probably one of the best methods. However, it typically requires relatively low-dimensional data. So a good strategy for visualizing similarity relationships in high-dimensional data is to encode data into a low-dimensional space (e.g. 32 dimensional) using an autoencoder first, extract the encoding of the input data followed by using t-SNE for mapping the compressed data to a 2D plane. \n",
|
||||
"\n",
|
||||
|
@ -845,9 +745,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -875,11 +773,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -908,10 +802,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We will [compute cosine distance](https://en.wikipedia.org/wiki/Cosine_similarity) between two images using `scipy`. "
|
||||
]
|
||||
|
@ -920,9 +811,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -937,11 +826,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1018,10 +903,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note: The cosine distance between the original images comparable to the distance between the corresponding decoded images. A value of 1 indicates high similarity between the images and 0 indicates no similarity.\n",
|
||||
"\n",
|
||||
|
@ -1031,11 +913,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1065,10 +943,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us compare the distance between different digits."
|
||||
]
|
||||
|
@ -1076,11 +951,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1154,10 +1025,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Print the results of the deep encoder test error for regression testing"
|
||||
]
|
||||
|
@ -1165,11 +1033,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1187,11 +1051,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1208,10 +1068,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Suggested tasks**\n",
|
||||
"\n",
|
||||
|
@ -1222,17 +1079,6 @@
|
|||
"- Can you use a different distance metric to compute similarity between the MNIST images.\n",
|
||||
"- Try a deep encoder with [1000, 500, 250, 128, 64, 32]. What is the training error for same number of iterations? "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
@ -1252,7 +1098,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -4,9 +4,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -15,10 +13,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK 106: Part A - Time series prediction with LSTM (Basics)\n",
|
||||
"\n",
|
||||
|
@ -32,11 +27,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -59,10 +50,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this tutorial we will use [LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory) to implement our model. LSTMs are well suited for this task because their ability to learn from experience. For details on how LSTMs work, see [this excellent post](http://colah.github.io/posts/2015-08-Understanding-LSTMs). \n",
|
||||
"\n",
|
||||
|
@ -78,9 +66,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -100,10 +86,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are two run modes:\n",
|
||||
"- *Fast mode*: `isFast` is set to `True`. This is the default mode for the notebooks, which means we train for fewer iterations or train / test on limited data. This ensures functional correctness of the notebook though the models produced are far from what a completed training would produce.\n",
|
||||
|
@ -115,9 +98,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -126,10 +107,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data generation\n",
|
||||
"\n",
|
||||
|
@ -160,9 +138,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -182,9 +158,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -218,10 +192,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us generate and visualize the generated data"
|
||||
]
|
||||
|
@ -229,11 +200,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -259,10 +226,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Network modeling\n",
|
||||
"\n",
|
||||
|
@ -275,9 +239,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -293,20 +255,14 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Training the network"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We define the `next_batch()` iterator that produces batches we can feed to the training function. \n",
|
||||
"Note that because CNTK supports variable sequence length, we must feed the batches as list of sequences. This is a convenience function to generate small batches of data often referred to as minibatch."
|
||||
|
@ -316,9 +272,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -337,10 +291,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Setup everything else we need for training the model: define user specified training parameters, define inputs, outputs, model and the optimizer."
|
||||
]
|
||||
|
@ -349,9 +300,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -364,10 +313,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Key Insight**\n",
|
||||
"\n",
|
||||
|
@ -397,9 +343,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -435,10 +379,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We are ready to train. 100 epochs should yield acceptable results."
|
||||
]
|
||||
|
@ -446,11 +387,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -488,11 +425,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -512,10 +445,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Normally we would validate the training on the data that we set aside for validation but since the input data is small we can run validattion on all parts of the dataset."
|
||||
]
|
||||
|
@ -524,9 +454,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -542,11 +470,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -566,11 +490,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -588,10 +508,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Since we used a simple sin(x) function we should expect that the errors are the same for train, validation and test sets. For real datasets that will be different of course. We also plot the expected output (Y) and the prediction our model made to shows how well the simple LSTM approach worked."
|
||||
]
|
||||
|
@ -599,11 +516,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -632,9 +545,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"Not perfect but close enough, considering the simplicity of the model.\n",
|
||||
|
@ -643,17 +554,6 @@
|
|||
"\n",
|
||||
"To improve results, we could train with more data points, let the model train for more epochs, or improve the model itself."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
@ -673,7 +573,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -2,10 +2,7 @@
|
|||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK 106: Part B - Time series prediction with LSTM (IOT Data)\n",
|
||||
"\n",
|
||||
|
@ -33,9 +30,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"We need a few imports and constants throughout the tutorial that we define here."
|
||||
|
@ -45,9 +40,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -76,9 +69,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -88,10 +79,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are two run modes:\n",
|
||||
"- *Fast mode*: `isFast` is set to `True`. This is the default mode for the notebooks, which means we train for fewer iterations or train / test on limited data. This ensures functional correctness of the notebook though the models produced are far from what a completed training would produce.\n",
|
||||
|
@ -105,9 +93,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -119,10 +105,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data generation\n",
|
||||
"\n",
|
||||
|
@ -145,10 +128,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Pre-processing\n",
|
||||
"Most of the code in this example is related to data preparation. Thankfully the pandas library make this easy.\n",
|
||||
|
@ -172,9 +152,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -256,10 +234,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data caching\n",
|
||||
"For routine testing we would like to cache the data locally when available. If it is not available from the cache locations we shall download."
|
||||
|
@ -268,11 +243,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -296,10 +267,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Utility for data fetching**\n",
|
||||
"\n",
|
||||
|
@ -312,9 +280,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -333,10 +299,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Understand the data format**\n",
|
||||
"\n",
|
||||
|
@ -346,11 +309,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -372,11 +331,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -397,10 +352,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model Creation (LSTM network)\n",
|
||||
"\n",
|
||||
|
@ -430,9 +382,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -450,10 +400,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Training\n",
|
||||
"Before we can start training we need to bind our input variables for the model and define what optimizer we want to use. For this example we choose the `adam` optimizer. We choose `squared_error` as our loss function."
|
||||
|
@ -463,9 +410,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -499,10 +444,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Time to start training."
|
||||
]
|
||||
|
@ -510,11 +452,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -553,10 +491,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"A look how the loss function shows how the model is converging:"
|
||||
]
|
||||
|
@ -564,11 +499,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -587,10 +518,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let us validate the training validation and test dataset. We use mean squared error as measure which might be a little simplistic. A method that would define a ratio how many predictions have been inside a given tolerance would make a better measure."
|
||||
]
|
||||
|
@ -599,9 +527,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -617,11 +543,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -641,11 +563,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -663,10 +581,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Visualize results\n",
|
||||
"\n",
|
||||
|
@ -677,9 +592,6 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
|
@ -711,10 +623,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If we let the model train for 2000 epochs the predictions are close to the actual data and follow the right pattern."
|
||||
]
|
||||
|
@ -722,9 +631,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"**Suggested activity**\n",
|
||||
|
@ -739,15 +646,6 @@
|
|||
"\n",
|
||||
"We hope this tutorial gets you started on time series prediction with neural networks."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
@ -767,7 +665,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK: A Guided Tour\n",
|
||||
"# CNTK 200: A Guided Tour\n",
|
||||
"\n",
|
||||
"This tutorial exposes many advanced features of CNTK and is aimed towards people who have had some previous exposure to deep learning and/or other deep learning toolkits. If you are a complete beginner we suggest you start with the CNTK 101 Tutorial and come here after you have covered most of the 100 series.\n",
|
||||
"\n",
|
||||
|
@ -55,9 +55,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from __future__ import print_function\n",
|
||||
|
@ -79,7 +77,7 @@
|
|||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"# Defining Your Model Structure\n",
|
||||
"## Defining Your Model Structure\n",
|
||||
"\n",
|
||||
"So let us dive right in. Below we will introduce CNTK's data model and CNTK's programming model--*networks are function objects* (i.e. a network can be called like a function, and it also holds some state, the weights, or parameters, that get adjusted during learning). We will put that into action for logistic regression and MNIST digit recognition,\n",
|
||||
"using CNTK's Functional API. Lastly, CNTK also has a lower-level graph API. We will replicate one example with it.\n",
|
||||
|
@ -102,9 +100,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -241,9 +237,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -297,9 +291,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_lr_factory = cntk.layers.Dense(num_classes_lr, activation=None)\n",
|
||||
|
@ -334,9 +326,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -370,9 +360,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -424,9 +412,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -454,9 +440,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -481,9 +465,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -514,9 +496,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -575,9 +555,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def create_model_mn_factory():\n",
|
||||
|
@ -625,9 +603,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"@cntk.Function\n",
|
||||
|
@ -651,9 +627,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"N = len(X_train_mn)\n",
|
||||
|
@ -682,9 +656,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -753,7 +725,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Graph API Example: MNIST Digit Recognition Again\n",
|
||||
"## Graph API Example: MNIST Digit Recognition Again\n",
|
||||
"\n",
|
||||
"CNTK also allows networks to be written by using a graph-level API. This API is more verbose but sometimes more flexible. The following defines the same model and criterion function as above, and will get the same result."
|
||||
]
|
||||
|
@ -761,9 +733,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -796,7 +766,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Feeding Your Data\n",
|
||||
"## Feeding Your Data\n",
|
||||
"\n",
|
||||
"Once you have decided your model structure and defined it, you are facing the question on feeding\n",
|
||||
"your training data to the CNTK training process.\n",
|
||||
|
@ -844,9 +814,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"image_width, image_height, num_channels = (32, 32, 3)\n",
|
||||
|
@ -881,9 +849,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -948,7 +914,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Training and Evaluating\n",
|
||||
"## Training and Evaluating\n",
|
||||
"\n",
|
||||
"In our examples above, we use the `train()` function to train, and `test()` for evaluating.\n",
|
||||
"In this section, we want to walk you through the advanced options of `train()`:\n",
|
||||
|
@ -1023,9 +989,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1106,7 +1070,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Deploying your Model\n",
|
||||
"## Deploying your Model\n",
|
||||
"\n",
|
||||
"Your ultimate purpose of training a deep neural network is to deploy it as part of your own program or product.\n",
|
||||
"Since this involves programming languages other than Python,\n",
|
||||
|
@ -1125,9 +1089,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1166,9 +1128,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1222,7 +1182,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Conclusion\n",
|
||||
"## Conclusion\n",
|
||||
"\n",
|
||||
"This tutorial provided an overview of the five main tasks of creating and using a deep neural network with CNTK.\n",
|
||||
"\n",
|
||||
|
@ -1239,7 +1199,7 @@
|
|||
"metadata": {
|
||||
"anaconda-cloud": {},
|
||||
"kernelspec": {
|
||||
"display_name": "Python [default]",
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
|
@ -1253,7 +1213,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.4.5"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -2,12 +2,9 @@
|
|||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK 201A Part A: CIFAR-10 Data Loader\n",
|
||||
"# CNTK 201: Part A - CIFAR-10 Data Loader\n",
|
||||
"\n",
|
||||
"This tutorial will show how to prepare image data sets for use with deep learning algorithms in CNTK. The CIFAR-10 dataset (http://www.cs.toronto.edu/~kriz/cifar.html) is a popular dataset for image classification, collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. It is a labeled subset of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset.\n",
|
||||
"\n",
|
||||
|
@ -26,9 +23,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -57,10 +52,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data download\n",
|
||||
"\n",
|
||||
|
@ -73,9 +65,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -86,10 +76,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We first setup a few helper functions to download the CIFAR data. The archive contains the files data_batch_1, data_batch_2, ..., data_batch_5, as well as test_batch. Each of these files is a Python \"pickled\" object produced with cPickle. To prepare the input data for use in CNTK we use three oprations:\n",
|
||||
"> `readBatch`: Unpack the pickle files\n",
|
||||
|
@ -107,9 +94,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -158,10 +143,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Save images\n",
|
||||
"\n",
|
||||
|
@ -172,9 +154,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -219,10 +199,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`saveTrainImages` and `saveTestImages` are simple wrapper functions to iterate through the data set."
|
||||
]
|
||||
|
@ -231,9 +208,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -275,9 +250,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -307,11 +280,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -377,7 +346,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -4,9 +4,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -15,20 +13,14 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK 201B: Hands On Image Recognition"
|
||||
"# CNTK 201: Part B - Image Understanding"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This tutorial shows how to implement image recognition task using [convolution network][] with CNTK v2 Python API. You will start with a basic feedforward CNN architecture to classify [CIFAR dataset](https://www.cs.toronto.edu/~kriz/cifar.html), then you will keep adding advanced features to your network. Finally, you will implement a VGG net and residual net like the one that won ImageNet competition but smaller in size.\n",
|
||||
"\n",
|
||||
|
@ -65,11 +57,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -92,10 +80,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The above image is from: https://www.cs.toronto.edu/~kriz/cifar.html\n",
|
||||
"\n",
|
||||
|
@ -111,11 +96,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -138,10 +119,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The stack of feature maps output are the input to the next layer."
|
||||
]
|
||||
|
@ -149,11 +127,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -176,10 +150,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998\n",
|
||||
"> Y. LeCun, L. Bottou, Y. Bengio and P. Haffner\n",
|
||||
|
@ -214,11 +185,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -241,10 +208,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**In CNTK**:\n",
|
||||
"\n",
|
||||
|
@ -310,9 +274,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -334,10 +296,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the block below, we check if we are running this notebook in the CNTK internal test machines by looking for environment variables defined there. We then select the right target device (GPU vs CPU) to test this notebook. In other cases, we use CNTK's default policy to use the best available device (GPU, if available, else CPU)."
|
||||
]
|
||||
|
@ -346,9 +305,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -361,10 +318,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data reading\n",
|
||||
"\n",
|
||||
|
@ -387,9 +341,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -442,11 +394,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -470,11 +418,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -497,10 +441,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model creation (Basic CNN)\n",
|
||||
"\n",
|
||||
|
@ -513,9 +454,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -538,10 +477,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Training and evaluation\n",
|
||||
"\n",
|
||||
|
@ -552,9 +488,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -689,11 +623,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -743,10 +673,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Although, this model is very simple, it still has too much code, we can do better. Here the same model in more terse format:"
|
||||
]
|
||||
|
@ -755,9 +682,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -779,11 +704,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -838,10 +759,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now that we have a trained model, let us classify the following image of a truck. We use PIL to read the image."
|
||||
]
|
||||
|
@ -849,11 +767,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -878,9 +792,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -894,10 +806,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"During training we have subtracted the mean from the input images. Here we take an approximate value of the mean and subtract it from the image."
|
||||
]
|
||||
|
@ -906,9 +815,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -932,11 +839,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -956,10 +859,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model: CNN with dropout\n",
|
||||
"\n",
|
||||
|
@ -970,9 +870,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -995,11 +893,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1049,10 +943,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model: CNN with BN\n",
|
||||
"\n",
|
||||
|
@ -1063,9 +954,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -1089,11 +978,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1143,10 +1028,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Popular Model\n",
|
||||
"\n",
|
||||
|
@ -1178,9 +1060,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -1204,11 +1084,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1258,10 +1134,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Residual Network (ResNet)\n",
|
||||
"\n",
|
||||
|
@ -1271,11 +1144,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -1298,10 +1167,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The idea of the above block is 2 folds:\n",
|
||||
"\n",
|
||||
|
@ -1345,9 +1211,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -1392,10 +1256,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's write the full model:"
|
||||
]
|
||||
|
@ -1404,9 +1265,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -1430,11 +1289,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1478,17 +1333,6 @@
|
|||
"source": [
|
||||
"pred_resnet = train_and_evaluate(reader_train, reader_test, max_epochs=5, model_func=create_resnet_model)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
@ -1508,7 +1352,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -2,10 +2,7 @@
|
|||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK 202: Language Understanding with Recurrent Networks\n",
|
||||
"\n",
|
||||
|
@ -36,10 +33,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data download\n",
|
||||
"\n",
|
||||
|
@ -57,11 +51,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -113,10 +103,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Importing libraries**: CNTK, math and numpy \n",
|
||||
"\n",
|
||||
|
@ -127,9 +114,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -144,10 +129,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Task Overview\n",
|
||||
"\n",
|
||||
|
@ -226,9 +208,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -256,10 +236,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we are ready to create a model and inspect it. \n",
|
||||
"\n",
|
||||
|
@ -273,11 +250,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -304,10 +277,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In our case we have input as one-hot encoded vector of length 943 and the output dimension `emb_dim` is set to 150. In the code below we pass the input variable `x` to our model `z`. This binds the model with input data of known shape. In this case, the input shape will be the size of the input vocabulary. With this modification, the parameter returned by the embed layer is completely specified (943, 150). **Note**: You can initialize the Embedding matrix with pre-computed vectors using [Word2Vec](https://en.wikipedia.org/wiki/Word2vec) or [GloVe](https://en.wikipedia.org/wiki/GloVe_%28machine_learning%29)."
|
||||
]
|
||||
|
@ -315,11 +285,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -337,10 +303,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To train and test a model in CNTK, we need to create a model and specify how to read data and perform training and testing. \n",
|
||||
"\n",
|
||||
|
@ -382,9 +345,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -399,11 +360,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -424,10 +381,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Training\n",
|
||||
"\n",
|
||||
|
@ -443,11 +397,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -473,10 +423,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"While the cell above works well when one has input parameters defined at network creation, it compromises readability. Hence we prefer creating functions as shown below"
|
||||
]
|
||||
|
@ -485,9 +432,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -501,9 +446,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -565,10 +508,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Run the trainer**\n",
|
||||
"\n",
|
||||
|
@ -579,9 +519,6 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
|
@ -616,10 +553,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This shows how learning proceeds over epochs (passes through the data).\n",
|
||||
"For example, after four epochs, the loss, which is the cross-entropy criterion, \n",
|
||||
|
@ -647,10 +581,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Evaluating the model\n",
|
||||
"\n",
|
||||
|
@ -661,9 +592,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -695,10 +624,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we can measure the model accuracy by going through all the examples in the test set and using the ``test_minibatch`` method of the trainer created inside the evaluate function defined above. At the moment (when this tutorial was written) the Trainer constructor requires a learner (even if it is only used to perform ``test_minibatch``) so we have to specify a dummy learner. In the future it will be allowed to construct a Trainer without specifying a learner as long as the trainer only calls ``test_minibatch``"
|
||||
]
|
||||
|
@ -706,11 +632,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -765,10 +687,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The following block of code illustrates how to evaluate a single sequence. Additionally we show how one can pass in the information using NumPy arrays."
|
||||
]
|
||||
|
@ -776,11 +695,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -834,10 +749,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Modifying the Model\n",
|
||||
"\n",
|
||||
|
@ -888,10 +800,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Task 1: Add Batch Normalization\n",
|
||||
"\n",
|
||||
|
@ -918,9 +827,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -940,10 +847,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Task 2: Add a Lookahead \n",
|
||||
"\n",
|
||||
|
@ -969,9 +873,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -991,10 +893,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Task 3: Bidirectional Recurrent Model\n",
|
||||
"\n",
|
||||
|
@ -1061,9 +960,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -1083,10 +980,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Works like a charm! This model achieves 0.30%, better than the lookahead model above.\n",
|
||||
"The bidirectional model has 40% less parameters than the lookahead one. However, if you go back and look closely\n",
|
||||
|
@ -1097,10 +991,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Solution 1: Adding Batch Normalization**"
|
||||
]
|
||||
|
@ -1108,11 +999,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1152,10 +1039,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Solution 2: Add a Lookahead**"
|
||||
]
|
||||
|
@ -1164,9 +1048,6 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
|
@ -1212,10 +1093,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Solution 3: Bidirectional Recurrent Model**"
|
||||
]
|
||||
|
@ -1223,11 +1101,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1270,17 +1144,6 @@
|
|||
"do_train()\n",
|
||||
"do_test()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
@ -1300,7 +1163,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -4,9 +4,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -15,10 +13,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK 203: Reinforcement Learning Basics\n",
|
||||
"\n",
|
||||
|
@ -33,11 +28,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -60,10 +51,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Problem**\n",
|
||||
"\n",
|
||||
|
@ -74,32 +62,35 @@
|
|||
"Our goal is to prevent the pole from falling over as the cart moves with the pole in upright position (perpendicular to the cart) as the starting state. More specifically if the pole is less than 15 degrees from vertical while the cart is within 2.4 units of the center we will collect reward. In this tutorial, we will train till we learn a set of actions (policies) that lead to an average reward of 200 or more over last 50 batches.\n",
|
||||
"\n",
|
||||
"In, RL terminology, the goal is to find _policies_ $a$, that maximize the _reward_ $r$ (feedback) through interaction with some environment (in this case the pole being balanced on the cart). So given a series of experiences $$s \\xrightarrow{a} r, s'$$ we then can learn how to choose action $a$ in a given state $s$ to maximize the accumulated reward $r$ over time:\n",
|
||||
"\\begin{align}\n",
|
||||
"Q(s,a) &= r_0 + \\gamma r_1 + \\gamma^2 r_2 + \\ldots \\newline\n",
|
||||
"&= r_0 + \\gamma \\max_a Q^*(s',a)\n",
|
||||
"\\end{align}\n",
|
||||
"\n",
|
||||
"$$\n",
|
||||
"Q(s,a) = r_0 + \\gamma r_1 + \\gamma^2 r_2 + \\ldots = r_0 + \\gamma \\max_a Q^*(s',a)\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"where $\\gamma \\in [0,1)$ is the discount factor that controls how much we should value reward that is further away. This is called the [*Bellmann*-equation](https://en.wikipedia.org/wiki/Bellman_equation).\n",
|
||||
"\n",
|
||||
"In this tutorial we will show how to model the state space, how to use the received reward to figure out which action yields the highest future reward.\n",
|
||||
"\n",
|
||||
"We present two different popular approaches here:\n",
|
||||
"\n",
|
||||
"**Deep Q-Networks**: DQNs have become famous in 2015 when they were successfully used to train how to play Atari just form raw pixels. We train neural network to learn the $Q(s,a)$ values (thus _Q-Network _). From these $Q$ functions values we choose the best action.\n",
|
||||
"\n",
|
||||
"**Policy gradient**: This method directly estimates the policy (set of actions) in the network. The outcome is a learning of an ordered set of actions which leads to maximize reward by probabilistically choosing a subset of actions. In this tutorial, we learn the actions using a gradient descent approach to learn the policies.\n",
|
||||
"\n",
|
||||
"In this tutorial, we focus how to implement RL in CNTK. We choose a straight forward shallow network. One can extend the approaches by replacing our shallow model with deeper networks that are introduced in other CNTK tutorials.\n",
|
||||
"\n",
|
||||
"Additionally, this tutorial is in its early stages and will be evolving in future updates."
|
||||
"In this tutorial we will show how to model the state space, how to use the received reward to figure out which action yields the highest future reward."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We present two different popular approaches here:\n",
|
||||
"\n",
|
||||
"**Deep Q-Networks**: DQNs have become famous in 2015 when they were successfully used to train how to play Atari just form raw pixels. We train neural network to learn the $Q(s,a)$ values (thus $Q$-Network ). From these $Q$ functions values we choose the best action.\n",
|
||||
"\n",
|
||||
"**Policy gradient**: This method directly estimates the policy (set of actions) in the network. The outcome is a learning of an ordered set of actions which leads to maximize reward by probabilistically choosing a subset of actions. In this tutorial, we learn the actions using a gradient descent approach to learn the policies."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this tutorial, we focus how to implement RL in CNTK. We choose a straight forward shallow network. One can extend the approaches by replacing our shallow model with deeper networks that are introduced in other CNTK tutorials.\n",
|
||||
"\n",
|
||||
"Additionally, this tutorial is in its early stages and will be evolving in future updates.\n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"Please run the following cell from the menu above or select the cell below and hit `Shift + Enter` to ensure the environment is ready. Verify that the following imports work in your notebook."
|
||||
]
|
||||
|
@ -108,9 +99,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -128,10 +117,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We use the following construct to install the OpenAI gym package if it is not installed. For users new to Jupyter environment, this construct can be used to install any python package. "
|
||||
]
|
||||
|
@ -140,9 +126,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -155,10 +139,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Select the notebook run mode**\n",
|
||||
"\n",
|
||||
|
@ -172,9 +153,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -184,9 +163,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"## CartPole: Data and Environment"
|
||||
|
@ -194,10 +171,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We will use the [CartPole](https://gym.openai.com/envs/CartPole-v0) environment from OpenAI's [gym](https://github.com/openai/gym) simulator to teach a cart to balance a pole. Please follow the links to get more details.\n",
|
||||
"\n",
|
||||
|
@ -220,10 +194,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Part 1: DQN\n",
|
||||
"\n",
|
||||
|
@ -237,10 +208,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Model: DQN\n",
|
||||
"\n",
|
||||
|
@ -252,10 +220,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We will start with a slightly modified version for Keras, https://github.com/jaara/AI-blog/blob/master/CartPole-basic.py, published by Jaromír Janisch in his [AI blog](https://jaromiru.com/2016/09/27/lets-make-a-dqn-theory/), and will then incrementally convert it to use CNTK.\n",
|
||||
"\n",
|
||||
|
@ -268,9 +233,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -284,10 +247,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the block below, we check if we are running this notebook in the CNTK internal test machines by looking for environment variables defined there. We then select the right target device (GPU vs CPU) to test this notebook. In other cases, we use CNTK's default policy to use the best available device (GPU, if available, else CPU)."
|
||||
]
|
||||
|
@ -296,9 +256,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -312,10 +270,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"STATE_COUNT = 4 (corresponding to $(x, \\dot{x}, \\theta, \\dot{\\theta})$),\n",
|
||||
"\n",
|
||||
|
@ -325,11 +280,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
|
@ -360,10 +311,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note: in the cell below we highlight how one would do it in Keras. And a marked similarity with CNTK. While CNTK allows for more compact representation, we present a slightly verbose illustration for ease of learning.\n",
|
||||
"\n",
|
||||
|
@ -376,9 +324,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -431,10 +377,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The `Memory` class stores the different states, actions and rewards."
|
||||
]
|
||||
|
@ -443,9 +386,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -468,10 +409,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The `Agent` uses the `Brain` and `Memory` to replay the past actions to choose optimal set of actions that maximize the rewards."
|
||||
]
|
||||
|
@ -480,9 +418,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -552,10 +488,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Training\n",
|
||||
"\n",
|
||||
|
@ -567,8 +500,6 @@
|
|||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [],
|
||||
|
@ -593,10 +524,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Exploration - exploitation trade-off\n",
|
||||
"\n",
|
||||
|
@ -606,11 +534,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -642,10 +566,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We are now ready to train our agent using **DQN**. Note this would take anywhere between 2-10 min and we stop whenever the learner hits the average reward of 200 over past 50 batches. One would get better results if they could train the learner until say one hits a reward of 200 or higher for say larger number of runs. This is left as an exercise."
|
||||
]
|
||||
|
@ -653,11 +574,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -834,10 +751,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you run it, you should see something like\n",
|
||||
"```\n",
|
||||
|
@ -854,10 +768,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Task 1.1**\n",
|
||||
"Rewrite the model without using the layer lib.\n",
|
||||
|
@ -868,10 +779,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Running the DQN model"
|
||||
]
|
||||
|
@ -879,11 +787,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
|
@ -930,10 +834,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Part 2: Policy gradient (PG)\n",
|
||||
"**Goal:**\n",
|
||||
|
@ -945,17 +846,14 @@
|
|||
"1. Collect experience (sample a bunch of trajectories through $(s,a)$ space)\n",
|
||||
"2. Update the policy so that _good_ experiences become more probable\n",
|
||||
"\n",
|
||||
"**Difference to DQN: **\n",
|
||||
"**Difference to DQN:**\n",
|
||||
" * we don't consider single $(s,a,r,s')$ transitions, but rather use whole episodes for the gradient updates\n",
|
||||
" * our parameters directly model the policy (output is an action probability), whereas in DQN they model the value function (output is raw score)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Rewards\n",
|
||||
"\n",
|
||||
|
@ -968,9 +866,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -987,11 +883,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -1022,10 +914,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We normalize the rewards so that they tank below zero towards the end. gamma controls how late the rewards tank."
|
||||
]
|
||||
|
@ -1033,11 +922,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -1070,11 +955,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -1108,10 +989,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Model: Policy Gradient\n",
|
||||
"\n",
|
||||
|
@ -1128,9 +1006,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -1155,10 +1031,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Running the PG model\n",
|
||||
"\n",
|
||||
|
@ -1168,11 +1041,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1318,10 +1187,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Solutions**"
|
||||
]
|
||||
|
@ -1329,11 +1195,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -1362,11 +1224,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -1391,17 +1249,6 @@
|
|||
"\n",
|
||||
"model.layer1.W.shape, model.layer1.b.shape, model.layer2.W.shape, model.layer2.b.shape, model.shape"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
@ -1421,7 +1268,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -4,9 +4,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -15,10 +13,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK 204: Sequence to Sequence Networks with Text Data\n",
|
||||
"\n",
|
||||
|
@ -33,11 +28,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -60,10 +51,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this tutorial, we are going to be talking about the fourth paradigm: many-to-many where the length of the output does not necessarily equal the length of the input, also known as sequence-to-sequence networks. The input is a sequence with a dynamic length, and the output is also a sequence with some dynamic length. It is the logical extension of the many-to-one paradigm in that previously we were predicting some category (which could easily be one of `V` words where `V` is an entire vocabulary) and now we want to predict a whole sequence of those categories.\n",
|
||||
"\n",
|
||||
|
@ -77,11 +65,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -104,10 +88,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The basic sequence-to-sequence network passes the information from the encoder to the decoder by initializing the decoder RNN with the final hidden state of the encoder as its initial hidden state. The input is then a \"sequence start\" tag (`<s>` in the diagram above) which primes the decoder to start generating an output sequence. Then, whatever word (or note or image, etc.) it generates at that step is fed in as the input for the next step. The decoder keeps generating outputs until it hits the special \"end sequence\" tag (`</s>` above).\n",
|
||||
"\n",
|
||||
|
@ -117,11 +98,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -144,20 +121,14 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The `Attention` layer above takes the current value of the hidden state in the Decoder, all of the hidden states in the Encoder, and calculates an augmented version of the hidden state to use. More specifically, the contribution from the Encoder's hidden states will represent a weighted sum of all of its hidden states where the highest weight corresponds both to the biggest contribution to the augmented hidden state and to the hidden state that will be most important for the Decoder to consider when generating the next word."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Problem: Grapheme-to-Phoneme Conversion\n",
|
||||
"\n",
|
||||
|
@ -173,10 +144,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Model structure overview**\n",
|
||||
"\n",
|
||||
|
@ -187,10 +155,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Importing CNTK and other useful libraries**\n",
|
||||
"\n",
|
||||
|
@ -201,9 +166,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -221,9 +184,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -234,10 +195,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Downloading the data\n",
|
||||
"\n",
|
||||
|
@ -259,11 +217,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -313,10 +267,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Data Reader\n",
|
||||
"\n",
|
||||
|
@ -327,9 +278,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -355,11 +304,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -389,10 +334,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We will use the above to create a reader for our training data. Let's create it now:"
|
||||
]
|
||||
|
@ -401,9 +343,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -422,20 +362,14 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Set our model hyperparameters**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We have a number of settings that control the complexity of our network, the shapes of our inputs, and other options such as whether we will use an embedding (and what size to use), and whether or not we will employ attention. We set them now as they will be made use of when we build the network graph in the following sections."
|
||||
]
|
||||
|
@ -444,9 +378,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -464,10 +396,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model Creation\n",
|
||||
"\n",
|
||||
|
@ -480,9 +409,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -492,10 +419,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 1: setup the input to the network\n",
|
||||
"\n",
|
||||
|
@ -516,9 +440,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -531,10 +453,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 2: define the network\n",
|
||||
"\n",
|
||||
|
@ -550,11 +469,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -577,10 +492,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For the decoder, we first define several sub-layers: the `Stabilizer` for the decoder input, the `Recurrence` blocks for each of the decoder's layers, the `Stabilizer` for the output of the stack of LSTMs, and the final `Dense` output layer. If we are using attention, then we also create an `AttentionModel` function `attention_model` which returns an augmented version of the decoder's hidden state with emphasis placed on the encoder hidden states that should be most used for the given step while generating the next output token.\n",
|
||||
"\n",
|
||||
|
@ -595,9 +507,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -670,20 +580,14 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The network that we defined above can be thought of as an \"abstract\" model that must first be wrapped to be used. In this case, we will use it first to create a \"training\" version of the model (where the history for the Decoder will be the ground-truth labels), and then we will use it to create a greedy \"decoding\" version of the model where the history for the Decoder will be the `hardmax` output of the network. Let's set up these model wrappers next."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Training\n",
|
||||
"\n",
|
||||
|
@ -694,9 +598,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -715,10 +617,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, we create the CNTK Function `model_train` again using the `@Function` decorator. This function takes the input sequence `input` and the output sequence `labels` as arguments. The `past_labels` are setup as the `history` for the model we created earlier by using the `Delay` layer. This will return the previous time-step value for the input `labels` with an `initial_state` of `sentence_start`. Therefore, if we give the labels `['a', 'b', 'c']`, then `past_labels` will contain `['<s>', 'a', 'b', 'c']` and then return our abstract base model called with the history `past_labels` and the input `input`.\n",
|
||||
"\n",
|
||||
|
@ -729,9 +628,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -755,10 +652,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above we create a new CNTK Function `model_greedy` which this time only takes a single argument. This is of course because when using the model at test time we don't have any labels -- it is the model's job to create them for us! In this case, we use the `UnfoldFrom` layer which runs the base model with the current `history` and funnels it into the `hardmax`. The `hardmax`'s output then becomes part of the `history` and we keep unfolding the `Recurrence` until the `sentence_end_index` has been reached. The maximum length of the output sequence (the maximum unfolding of the Decoder) is determined by a multiplier passed to `length_increase`. In this case we set `length_increase` to `1.5` above so the maximum length of each output sequence is 1.5x its input.\n",
|
||||
"\n",
|
||||
|
@ -769,9 +663,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -792,10 +684,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, we create the criterion function which drops the sequence-start symbol from our labels for us, runs the model with the given `input` and `labels`, and uses the output to compare to our ground truth. We use the loss function `cross_entropy_with_softmax` and get the `classification_error` which gives us the percent-error per-word of our generation accuracy. The CNTK Function `criterion` returns these values as a tuple and the Python function `create_criterion_function(model)` returns that CNTK Function.\n",
|
||||
"\n",
|
||||
|
@ -806,9 +695,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -884,10 +771,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the above function, we created one version of the model for training (plus its associated criterion function) and one version of the model for evaluation. Normally this latter version would not be required but here we have done it so that we can periodically sample from the non-training model to visually understand how our model is converging by seeing the kinds of sequences that it generates as the training progresses.\n",
|
||||
"\n",
|
||||
|
@ -900,9 +784,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -918,10 +800,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Inside the training loop, we proceed much like many other CNTK networks. We request the next bunch of minibatch data, we perform our training, and we print our progress to the screen using the `progress_printer`. Where we diverge from the norm, however, is where we run an evaluation using our `model_greedy` version of the network and run a single sequence, \"ABADI\" through to see what the network is currently predicting.\n",
|
||||
"\n",
|
||||
|
@ -932,9 +811,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -960,10 +837,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's try training our network for a small part of an epoch. In particular, we'll run through 25,000 tokens (about 3% of one epoch):"
|
||||
]
|
||||
|
@ -971,11 +845,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1055,10 +925,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As we can see above, while the loss has come down quite a ways, the output sequence is still quite a ways off from what we expect. Uncomment the code below to run for a full epoch (notice that we switch the `epoch_size` parameter to the actual size of the training data) and by the end of the first epoch you will already see a very good grapheme-to-phoneme translation model running!"
|
||||
]
|
||||
|
@ -1067,9 +934,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -1079,10 +944,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Testing the network\n",
|
||||
"\n",
|
||||
|
@ -1096,8 +958,6 @@
|
|||
"execution_count": 24,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
|
@ -1112,10 +972,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we need to define our testing function. We pass the `reader`, the learned `s2smodel`, and the vocabulary map `i2w` so that we can directly compare the model's predictions to the test set labels. We loop over the test set, evaluate the model on minibatches of size 512 for efficiency, and keep track of the error rate. Note that below we test *per-sequence*. This means that every single token in a generated sequence must match the tokens in the label for that sequence to be considered as correct."
|
||||
]
|
||||
|
@ -1124,9 +981,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -1162,10 +1017,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we will evaluate the decoding using the above function. If you use the version of the model we trained above with just a small 50000 sample of the training data, you will get an error rate of 100% because we cannot possibly get every single token correct with such a small amount of training. However, if you uncommented the training line above that trains the network for a full epoch, you should have ended up with a much-improved model that showed approximately the following training statistics:\n",
|
||||
"\n",
|
||||
|
@ -1179,11 +1031,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1211,9 +1059,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"If you did not run the training for the full first epoch, the output above will be a `1.0` meaning 100% string error rate. If, however, you uncommented the line to perform training for a full epoch, you should get an output of `0.569`. A string error rate of `56.9` is actually not bad for a single pass over the data. Let's now modify the above `evaluate_decoding` function to output the per-phoneme error rate. This means that we are calculating the error at a higher precision and also makes things easier in some sense because with the string error rate we could have every phoneme correct but one in each example and still end up with a 100% error rate. Here is the modified version of that function:"
|
||||
|
@ -1223,9 +1069,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -1266,11 +1110,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1298,20 +1138,14 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you're using the model that was trained for one full epoch, then you should get a phoneme error rate of around 10%. Not bad! This means that for each of the 383,294 phonemes in the test set, our model predicted nearly 90% of them correctly (if you used the quickly-trained version of the model then you will get an error rate of around 45%). Now, let's work with an interactive session where we can input our own input sequences and see how the model predicts their pronunciation (i.e. phonemes). Additionally, we will visualize the Decoder's attention for these samples to see which graphemes in the input it deemed to be important for each phoneme that it produces. Note that in the examples below the results will only be good if you use a model that has been trained for at least one epoch."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Interactive session\n",
|
||||
"\n",
|
||||
|
@ -1324,9 +1158,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -1376,10 +1208,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The `translate` function above takes a list of letters input by the user as `tokens`, the greedy decoding version of our model `model_decoding`, the vocabulary `vocab`, a map of index to vocab `i2w`, and the `show_attention` option which determines if we will visualize the attention vectors or not.\n",
|
||||
"\n",
|
||||
|
@ -1394,9 +1223,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -1429,10 +1256,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The above function simply creates a greedy decoder around our model and then continually asks the user for an input which we pass to our `translate` function. Visualizations of the attention will continue being appended to the notebook until you exit the loop by typing `quit`. Please uncomment the following line to try out the interaction session."
|
||||
]
|
||||
|
@ -1440,11 +1264,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1479,10 +1299,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice how the attention weights show how important different parts of the input are for generating different tokens in the output. For tasks like machine translation, where the order of one-to-one words often changes due to grammatical differences between languages, this becomes very interesting as we see the attention window move further away from the diagonal that is mostly displayed in grapheme-to-phoneme translations.\n",
|
||||
"\n",
|
||||
|
@ -1490,15 +1307,6 @@
|
|||
"\n",
|
||||
"With the above model, you have the basics for training a powerful sequence-to-sequence model with attention in a number of distinct domains. The only major changes required are preparing a dataset with pairs input and output sequences and in general the rest of the building blocks will remain the same. Good luck, and have fun!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
@ -1518,7 +1326,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -2,12 +2,9 @@
|
|||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK 205 Artistic Style Transfer\n",
|
||||
"# CNTK 205: Artistic Style Transfer\n",
|
||||
"\n",
|
||||
"This tutorial shows how to transfer the style of one image to another. This allows us to take our ordinary photos and render them in the style of famous images or paintings.\n",
|
||||
"\n",
|
||||
|
@ -22,9 +19,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -46,10 +41,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The pretrained model is a VGG network which we originally got from [this page](https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3). We host it in a place which permits easy downloading. Below we download it if it is not already available locally and load the weights into numpy arrays."
|
||||
]
|
||||
|
@ -57,11 +49,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -110,10 +98,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next we define the VGG network as a CNTK graph. "
|
||||
]
|
||||
|
@ -122,9 +107,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -161,10 +144,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Defining the loss function\n",
|
||||
"\n",
|
||||
|
@ -186,9 +166,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -233,10 +211,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Instantiating the loss\n",
|
||||
"\n",
|
||||
|
@ -251,11 +226,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -354,10 +325,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Optimizing the loss\n",
|
||||
"\n",
|
||||
|
@ -380,11 +348,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -474,11 +438,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -514,7 +474,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -15,7 +15,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK 206 Part A: Basic GAN with MNIST data\n",
|
||||
"# CNTK 206: Part A - Basic GAN with MNIST data\n",
|
||||
"\n",
|
||||
"**Prerequisites**: We assume that you have successfully downloaded the MNIST data by completing the tutorial titled CNTK_103A_MNIST_DataLoader.ipynb.\n",
|
||||
"\n",
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK 206 Part B: Deep Convolutional GAN with MNIST data\n",
|
||||
"# CNTK 206: Part B - Deep Convolutional GAN with MNIST data\n",
|
||||
"\n",
|
||||
"**Prerequisites**: We assume that you have successfully downloaded the MNIST data by completing the tutorial titled CNTK_103A_MNIST_DataLoader.ipynb.\n",
|
||||
"\n",
|
||||
|
@ -168,7 +168,7 @@
|
|||
"- The **Generator** takes random noise vector ($z$) as input and strives to output synthetic (fake) image ($x^*$) that is indistinguishable from the real image ($x$) from the MNIST dataset. \n",
|
||||
"- The **Discriminator** strives to differentiate between the real image ($x$) and the fake ($x^*$) image.\n",
|
||||
"\n",
|
||||
"![GAN-flow](https://www.cntk.ai/jup/GAN_basic_flow.png)\n",
|
||||
"![](https://www.cntk.ai/jup/GAN_basic_flow.png)\n",
|
||||
"\n",
|
||||
"In each training iteration, the Generator produces more realistic fake images (in other words *minimizes* the difference between the real and generated counterpart) and the Discriminator *maximizes* the probability of assigning the correct label (real vs. fake) to both real examples (from training set) and the generated fake ones. The two conflicting objectives between the sub-networks ($G$ and $D$) leads to the GAN network (when trained) converge to an equilibrium, where the Generator produces realistic looking fake MNIST images and the Discriminator can at best randomly guess whether images are real or fake. The resulting Generator model once trained produces realistic MNIST image with the input being a random number. "
|
||||
]
|
||||
|
@ -387,7 +387,7 @@
|
|||
"\n",
|
||||
"At the optimal point of this game the generator will produce realistic looking data while the discriminator will predict that the generated image is indeed fake with a probability of 0.5. The [algorithm referred below](https://arxiv.org/pdf/1406.2661v1.pdf) is implemented in this tutorial.\n",
|
||||
"\n",
|
||||
"![NIPS2014](https://www.cntk.ai/jup/GAN_goodfellow_NIPS2014.png)"
|
||||
"![](https://www.cntk.ai/jup/GAN_goodfellow_NIPS2014.png)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -642,7 +642,7 @@
|
|||
"source": [
|
||||
"Larger number of iterations should generate more realistic looking MNIST images. A sampling of such generated images are shown below.\n",
|
||||
"\n",
|
||||
"![DCGAN-results](http://www.cntk.ai/jup/cntk206B_dcgan_result.jpg)\n",
|
||||
"![](http://www.cntk.ai/jup/cntk206B_dcgan_result.jpg)\n",
|
||||
"\n",
|
||||
"**Note**: It takes a large number of iterations to capture a representation of the real world signal. Even simple dense networks can be quite effective in modelling data albeit MNIST is a relatively simple dataset as well."
|
||||
]
|
||||
|
|
|
@ -2,10 +2,7 @@
|
|||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK 207: Sampled Softmax\n",
|
||||
"\n",
|
||||
|
@ -14,10 +11,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Select the notebook runtime environment devices / settings**\n",
|
||||
"\n",
|
||||
|
@ -28,9 +22,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -46,10 +38,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Basic concept\n",
|
||||
"\n",
|
||||
|
@ -96,9 +85,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -153,10 +140,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To give a better idea of what the inputs and outputs are and how this all differs from the normal softmax we give below a corresponding function using normal softmax:"
|
||||
]
|
||||
|
@ -165,9 +149,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -197,10 +179,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As you can see the main differences to the api function `cross_entropy_with_softmax` are:\n",
|
||||
"* We include the mapping $ z = W h + b $ into the function.\n",
|
||||
|
@ -215,11 +194,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -418,10 +393,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the above code we use two different methods to report training progress:\n",
|
||||
"1. Using a function that computes the average cross entropy on full softmax.\n",
|
||||
|
@ -458,11 +430,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -520,9 +488,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"In the example above we compare uniform sampling (red) vs sampling with the same distribution the classes have (blue).\n",
|
||||
|
@ -532,9 +498,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"## What speedups to expect?\n",
|
||||
|
@ -552,11 +516,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -653,7 +613,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -2,10 +2,7 @@
|
|||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK 208: Training Acoustic Model with Connectionist Temporal Classification (CTC) Criteria\n",
|
||||
"This tutorial assumes familiarity with 10\\* CNTK tutorials and basic knowledge of data representation in acoustic modelling tasks. It introduces some CNTK building blocks that can be used in training deep networks for speech recognition on the example of CTC training criteria.\n",
|
||||
|
@ -23,11 +20,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -73,10 +66,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Read data\n",
|
||||
"\n",
|
||||
|
@ -92,9 +82,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -122,10 +110,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model creation\n",
|
||||
"\n",
|
||||
|
@ -136,9 +121,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -157,9 +140,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"### Define training hyperparameters\n",
|
||||
|
@ -171,9 +152,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -193,10 +172,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train"
|
||||
]
|
||||
|
@ -205,9 +181,6 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
|
@ -249,9 +222,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"## Evaluate "
|
||||
|
@ -260,11 +231,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -294,17 +261,6 @@
|
|||
"# Average of evaluation errors of all test minibatches\n",
|
||||
"round(test_result / num_test_minibatches,2)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
@ -324,7 +280,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -2,10 +2,7 @@
|
|||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CNTK 301: Image Recognition with Deep Transfer Learning\n",
|
||||
"\n",
|
||||
|
@ -47,9 +44,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -82,10 +77,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are two run modes:\n",
|
||||
"- *Fast mode*: `isFast` is set to `True`. This is the default mode for the notebooks, which means we train for fewer iterations or train / test on limited data. This ensures functional correctness of the notebook though the models produced are far from what a completed training would produce.\n",
|
||||
|
@ -99,9 +91,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -110,10 +100,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Data Download\n",
|
||||
"\n",
|
||||
|
@ -128,9 +115,6 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
|
@ -169,10 +153,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note that we are setting the data root to coincide with the CNTK examples, so if you have run those some of the data might already exist. Alter the data root if you would like all of the input and output data to go elsewhere (i.e. if you have copied this notebook to your own space). The `download_unless_exists` method will try to download several times, but if that fails you might see an exception. It and the `write_to_file` method both - write to files, so if the data_root is not writeable or fills up you'll see exceptions there."
|
||||
]
|
||||
|
@ -180,11 +161,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -325,10 +302,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Pre-Trained Model (ResNet)\n",
|
||||
"\n",
|
||||
|
@ -336,7 +310,7 @@
|
|||
"\n",
|
||||
"Residual Deep Learning is a technique that originated in Microsoft Research and involves \"passing through\" the main signal of the input data, so that the network winds up \"learning\" on just the residual portions that differ between layers. This has proven, in practice, to allow the training of much deeper networks by avoiding issues that plague gradient descent on larger networks. These cells bypass convolution layers and then come back in later before ReLU (see below), but some have argued that even deeper networks can be built by avoiding even more nonlinearities in the bypass channel. This is an area of hot research right now, and one of the most exciting parts of Transfer Learning is that you get to benefit from all of the improvements by just integrating new trained models.\n",
|
||||
"\n",
|
||||
"![A ResNet Block](https://adeshpande3.github.io/assets/ResNet.png)\n",
|
||||
"![](https://adeshpande3.github.io/assets/ResNet.png)\n",
|
||||
"\n",
|
||||
"For visualizations of some of the deeper ResNet architectures, see [Kaiming He's GitHub](https://github.com/KaimingHe/deep-residual-networks) where he links off to visualizations of 50, 101, and 152-layer architectures."
|
||||
]
|
||||
|
@ -344,11 +318,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -368,10 +338,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Inspecting pre-trained model\n",
|
||||
"\n",
|
||||
|
@ -381,11 +348,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -496,10 +459,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### New dataset\n",
|
||||
"\n",
|
||||
|
@ -514,9 +474,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -532,11 +490,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -609,10 +563,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Training the Transfer Learning Model\n",
|
||||
"\n",
|
||||
|
@ -625,9 +576,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -665,10 +614,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We will now train the model just like any other CNTK model training - instantiating an input source (in this case a `MinibatchSource` from our image data), defining the loss function, and training for a number of epochs. Since we are training a multi-class classifier network, the final layer is a cross-entropy Softmax, and the error function is classification error - both conveniently provided by utility functions in `cntk.ops`.\n",
|
||||
"\n",
|
||||
|
@ -679,9 +625,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -738,10 +682,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"When we evaluate the trained model on an image, we have to massage that image into the expected format. In our case we use `Image` to load the image from its path, resize it to the size expected by our model, reverse the color channels (RGB to BGR), and convert to a contiguous array along height, width, and color channels. This corresponds to the 224x224x3 flattened array on which our model was trained.\n",
|
||||
"\n",
|
||||
|
@ -752,9 +693,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -827,10 +766,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Finally, with all of these helper functions in place we can train the model and evaluate it on our flower dataset.\n",
|
||||
"\n",
|
||||
|
@ -845,9 +781,6 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
|
@ -916,9 +849,6 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true,
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
|
@ -945,11 +875,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -967,9 +893,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"### With much smaller dataset\n",
|
||||
|
@ -980,11 +904,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -1056,10 +976,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The images are stored in `Train` and `Test` folders with the nested folder giving the class name (i.e. `Sheep` and `Wolf` folders). This is quite common, so it is useful to know how to convert that format into one that can be used for constructing the mapping files CNTK expects. `create_class_mapping_from_folder` looks at all nested folders in the root and turns their names into labels, and returns this as an array used by `create_map_file_from_folder`. That method walks those folders and writes their paths and label indices into a `map.txt` file in the root (e.g. `Train`, `Test`). Note the use of `abspath`, allowing you to specify relative \"root\" paths to the method, and then move the resulting map files or run from different directories without issue. "
|
||||
]
|
||||
|
@ -1068,9 +985,7 @@
|
|||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -1128,10 +1043,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can now train our model on our small domain and evaluate the results:"
|
||||
]
|
||||
|
@ -1139,11 +1051,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1180,10 +1088,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now that the model is trained for animals data. Lets us evaluate the images."
|
||||
]
|
||||
|
@ -1191,11 +1096,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
|
@ -1238,10 +1139,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### The Known Unknown\n",
|
||||
"\n",
|
||||
|
@ -1251,11 +1149,7 @@
|
|||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
|
@ -1311,10 +1205,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Final Thoughts, and Caveats\n",
|
||||
"\n",
|
||||
|
@ -1322,17 +1213,6 @@
|
|||
"\n",
|
||||
"Adding a catch-all category can be a good idea, but only if the training data for that category contains images that are again sufficiently similar to the images you expect at scoring time. As in the above example, if we train a classifier with images of sheep and wolf and use it to score an image of a bird, the classifier can still only assign a sheep or wolf label, since it does not know any other categories. If we were to add a catch-all category and add training images of birds to it then the classifier might predict the class correctly for the bird image. However, if we present it, e.g., an image of a car, it faces the same problem as before as it knows only sheep, wolf and bird (which we just happened to call called catch-all). Hence, your training data, even for your catch-all, needs to cover sufficiently those concepts and images that you expect later on at scoring time."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"editable": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
@ -1352,7 +1232,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
"version": "3.5.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
|
@ -21,8 +21,16 @@ extensions = [
|
|||
'sphinx.ext.napoleon',
|
||||
'sphinx.ext.todo',
|
||||
'sphinx.ext.viewcode',
|
||||
'nbsphinx',
|
||||
'IPython.sphinxext.ipython_console_highlighting'
|
||||
]
|
||||
|
||||
# Suppress warnings
|
||||
suppress_warnings = ['image.nonlocal_uri']
|
||||
|
||||
# Define source suffix
|
||||
source_suffix = ['.rst', '.ipynb']
|
||||
|
||||
master_doc = 'index'
|
||||
|
||||
exclude_patterns = [
|
||||
|
@ -83,7 +91,8 @@ extlinks = {
|
|||
'cntk': (source_prefix + '/%s', ''),
|
||||
'cntktut': (source_prefix + '/Tutorials/%s.ipynb', ''),
|
||||
# CNTK Wiki has moved to a new site:
|
||||
'cntkwiki': ('https://docs.microsoft.com/en-us/cognitive-toolkit/%s', 'CNTK Doc - ')
|
||||
'cntkwiki': ('https://docs.microsoft.com/en-us/cognitive-toolkit/%s', 'CNTK Doc - '),
|
||||
'cntkman': (source_prefix + '/Manual/%s.ipynb', ''),
|
||||
}
|
||||
|
||||
# sphinx.ext.napoleon options
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
Examples
|
||||
========
|
||||
|
||||
The best way to learn about the APIs is to look at the
|
||||
following examples in the [CNTK clone root]/Examples directory:
|
||||
CNTK also offers several examples that are not in Tutorial style.
|
||||
Many of these are recipes involve more advanced networks and are located under :cntk:`Examples directory <Examples/>`.
|
||||
|
||||
- :cntk:`MNIST <Examples/Image/Classification/MLP/Python/SimpleMNIST.py>`:
|
||||
A fully connected feed-forward model for classification of MNIST
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
.. some aliases
|
||||
.. _CNTK: https://cntk.ai/
|
||||
|
||||
|
@ -23,6 +22,7 @@ them on sample data in real time. Please give feedback through these :cntkwiki:`
|
|||
Working with Sequences <sequence>
|
||||
Tutorials <tutorials>
|
||||
Examples <examples>
|
||||
Manuals <manuals>
|
||||
Layers Library Reference <layerref>
|
||||
Python API Reference <cntk>
|
||||
Readers, Multi-GPU, Profiling...<readersprofetc>
|
||||
|
|
|
@ -12,6 +12,12 @@ if NOT "%PAPER%" == "" (
|
|||
set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
|
||||
set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
|
||||
)
|
||||
set TUTORIALSOURCEDIR=..\..\..\Tutorials\
|
||||
set TUTORIALSOURCEFILE=CNTK_*.ipynb
|
||||
set IGNORETUTORIALSOURCEFILE=CNTK_5*.ipynb
|
||||
|
||||
set MANUALSOURCEDIR=..\..\..\Manual\
|
||||
set MANUALSOURCEFILE=Manual_*.ipynb
|
||||
|
||||
if "%1" == "" goto help
|
||||
|
||||
|
@ -73,7 +79,15 @@ if errorlevel 9009 (
|
|||
|
||||
|
||||
if "%1" == "html" (
|
||||
echo Copying tutorial notebooks to build directory
|
||||
CMD /C xcopy %TUTORIALSOURCEDIR%%TUTORIALSOURCEFILE% .
|
||||
del %IGNORETUTORIALSOURCEFILE%
|
||||
echo Copying manual notebooks to build directory
|
||||
CMD /C xcopy %MANUALSOURCEDIR%%MANUALSOURCEFILE% .
|
||||
%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
|
||||
echo Removing jupyter notebooks from build directory
|
||||
del %TUTORIALSOURCEFILE%
|
||||
del %MANUALSOURCEFILE%
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The HTML pages are in %BUILDDIR%/html.
|
||||
|
|
|
@ -0,0 +1,22 @@
|
|||
Manuals
|
||||
=======================================================
|
||||
|
||||
#. `How to create user minibatch sources <Manual_How_to_create_user_minibatch_sources.html>`_ (:cntkman:`source <Manual_How_to_create_user_minibatch_sources>`)
|
||||
|
||||
#. `How to debug CNTK python programs <Manual_How_to_debug.html>`_ (:cntkman:`source <Manual_How_to_debug>`)
|
||||
|
||||
#. `How to read and feed data <Manual_How_to_feed_data.html>`_ (:cntkman:`source <Manual_How_to_feed_data>`)
|
||||
|
||||
#. `How to train using the different APIs <Manual_How_to_train_using_declarative_and_imperative_API.html>`_ (:cntkman:`source <Manual_How_to_train_using_declarative_and_imperative_API>`)
|
||||
|
||||
#. `How to create and /or use CNTK learners <Manual_How_to_use_learners.html>`_ (:cntkman:`source <Manual_How_to_use_learners>`)
|
||||
|
||||
#. `How to create a custom deserializer <Manual_How_to_write_a_custom_deserializer.html>`_ (:cntkman:`source <Manual_How_to_write_a_custom_deserializer>`)
|
||||
|
||||
.. toctree::
|
||||
:glob:
|
||||
:maxdepth: 1
|
||||
:caption: List view
|
||||
:hidden:
|
||||
|
||||
Manual_*
|
|
@ -1,63 +1,63 @@
|
|||
Tutorials
|
||||
=========
|
||||
|
||||
*For a quick tour if you are familiar with another deep learning toolkit please fast forward to CNTK 200.*
|
||||
=======================================================
|
||||
|
||||
#. *Classify cancer using simulated data (Logistic Regression)*
|
||||
CNTK 101: :cntktut:`Logistic Regression <CNTK_101_LogisticRegression>` with NumPy
|
||||
CNTK 101:`Logistic Regression <CNTK_101_LogisticRegression.html>`_ with NumPy (:cntktut:`source <CNTK_101_LogisticRegression>`)
|
||||
|
||||
|
||||
#. *Classify cancer using simulated data (Feed Forward, FFN)*
|
||||
CNTK 102: :cntktut:`Feed Forward network <CNTK_102_FeedForward>` with NumPy
|
||||
CNTK 102: `Feed Forward network <CNTK_102_FeedForward.html>`_ with NumPy (:cntktut:`source <CNTK_102_FeedForward>`)
|
||||
|
||||
#. *Recognize hand written digits (OCR) with MNIST data*
|
||||
CNTK 103 Part A: :cntktut:`MNIST data preparation <CNTK_103A_MNIST_DataLoader>`,
|
||||
Part B: :cntktut:`Multi-class logistic regression classifier <CNTK_103B_MNIST_LogisticRegression>`
|
||||
Part C: :cntktut:`Multi-layer perceptron classifier <CNTK_103C_MNIST_MultiLayerPerceptron>`
|
||||
Part D: :cntktut:`Convolutional neural network classifier <CNTK_103D_MNIST_ConvolutionalNeuralNetwork>`
|
||||
CNTK 103 Part A: `MNIST data preparation <CNTK_103A_MNIST_DataLoader.html>`_ (:cntktut:`source <CNTK_103A_MNIST_DataLoader>`),
|
||||
Part B: `Multi-class logistic regression classifier <CNTK_103B_MNIST_LogisticRegression.html>`_ (:cntktut:`source <CNTK_103B_MNIST_LogisticRegression>`)
|
||||
Part C: `Multi-layer perceptron classifier <CNTK_103C_MNIST_MultiLayerPerceptron.html>`_
|
||||
(:cntktut:`source <CNTK_103C_MNIST_MultiLayerPerceptron>`)
|
||||
Part D: `Convolutional neural network classifier <CNTK_103D_MNIST_ConvolutionalNeuralNetwork.html>`_ (:cntktut:`source <CNTK_103D_MNIST_ConvolutionalNeuralNetwork>`)
|
||||
|
||||
#. *Learn how to predict the stock market*
|
||||
CNTK 104: :cntktut:`Time Series basics <CNTK_104_Finance_Timeseries_Basic_with_Pandas_Numpy>` with finance data
|
||||
CNTK 104: `Time Series basics <CNTK_104_Finance_Timeseries_Basic_with_Pandas_Numpy.html>`_ with finance data (:cntktut:`source <CNTK_104_Finance_Timeseries_Basic_with_Pandas_Numpy>` with finance data)
|
||||
|
||||
#. *Compress (using autoencoder) hand written digits from MNIST data with no human input (unsupervised learning, FFN)*
|
||||
CNTK 105 Part A: :cntktut:`MNIST data preparation <CNTK_103A_MNIST_DataLoader>`,
|
||||
Part B: :cntktut:`Feed Forward autoencoder <CNTK_105_Basic_Autoencoder_for_Dimensionality_Reduction>`
|
||||
CNTK 105 Part A: `MNIST data preparation <CNTK_103A_MNIST_DataLoader.html>`_ (:cntktut:`source <CNTK_103A_MNIST_DataLoader>`),
|
||||
Part B: `Feed Forward autoencoder <CNTK_105_Basic_Autoencoder_for_Dimensionality_Reduction.html>`_ (:cntktut:`source <CNTK_105_Basic_Autoencoder_for_Dimensionality_Reduction>`)
|
||||
|
||||
#. *Forecasting using data from an IOT device*
|
||||
CNTK 106: LSTM based forecasting - Part A: :cntktut:`with simulated data <CNTK_106A_LSTM_Timeseries_with_Simulated_Data>`,
|
||||
Part B: :cntktut:`with real IOT data <CNTK_106B_LSTM_Timeseries_with_IOT_Data>`
|
||||
CNTK 106: LSTM based forecasting - Part A: `with simulated data <CNTK_106A_LSTM_Timeseries_with_Simulated_Data.html>`_ (:cntktut:`source <CNTK_106A_LSTM_Timeseries_with_Simulated_Data>`),
|
||||
Part B: `with real IOT data <CNTK_106B_LSTM_Timeseries_with_IOT_Data.html>`_ (:cntktut:`source <CNTK_106B_LSTM_Timeseries_with_IOT_Data>`)
|
||||
|
||||
#. *Quick tour for those familiar with other deep learning toolkits*
|
||||
CNTK 200: :cntktut:`Guided Tour <CNTK_200_GuidedTour>`
|
||||
CNTK 200: `Guided Tour <CNTK_200_GuidedTour.html>`_ (:cntktut:`source <CNTK_200_GuidedTour>`)
|
||||
|
||||
#. *Recognize objects in images from CIFAR-10 data (Convolutional Network, CNN)*
|
||||
CNTK 201 Part A: :cntktut:`CIFAR data preparation <CNTK_201A_CIFAR-10_DataLoader>`,
|
||||
Part B: :cntktut:`VGG and ResNet classifiers <CNTK_201B_CIFAR-10_ImageHandsOn>`
|
||||
CNTK 201 Part A: `CIFAR data preparation <CNTK_201A_CIFAR-10_DataLoader.html>`_ (:cntktut:`source <CNTK_201A_CIFAR-10_DataLoader>`),
|
||||
Part B: `VGG and ResNet classifiers <CNTK_201B_CIFAR-10_ImageHandsOn.html>`_ (:cntktut:`source <CNTK_201B_CIFAR-10_ImageHandsOn>`)
|
||||
|
||||
#. *Infer meaning from text snippets using LSTMs and word embeddings*
|
||||
CNTK 202: :cntktut:`Language understanding <CNTK_202_Language_Understanding>`
|
||||
CNTK 202: `Language understanding <CNTK_202_Language_Understanding.html>`_ (:cntktut:`source <CNTK_202_Language_Understanding>`)
|
||||
|
||||
#. *Train a computer to perform tasks optimally (e.g., win games) in a simulated environment*
|
||||
CNTK 203: :cntktut:`Reinforcement learning basics <CNTK_203_Reinforcement_Learning_Basics>` with OpenAI Gym data
|
||||
CNTK 203: `Reinforcement learning basics <CNTK_203_Reinforcement_Learning_Basics.html>`_ with OpenAI Gym data (:cntktut:`source <CNTK_203_Reinforcement_Learning_Basics>`)
|
||||
|
||||
#. *Translate text from one domain (grapheme) to other (phoneme)*
|
||||
CNTK 204: :cntktut:`Sequence to sequence basics <CNTK_204_Sequence_To_Sequence>` with CMU pronouncing dictionary
|
||||
CNTK 204: `Sequence to sequence basics <CNTK_204_Sequence_To_Sequence.html>`_ with CMU pronouncing dictionary (:cntktut:`source <CNTK_204_Sequence_To_Sequence>`)
|
||||
|
||||
#. *Teach a computer to paint like Picasso or van Gogh*
|
||||
CNTK 205: :cntktut:`Artistic Style Transfer <CNTK_205_Artistic_Style_Transfer>`
|
||||
CNTK 205: `Artistic Style Transfer <CNTK_205_Artistic_Style_Transfer.html>`_ (:cntktut:`source <CNTK_205_Artistic_Style_Transfer>`)
|
||||
|
||||
#. *Produce realistic data (MNIST images) with no human input (unsupervised learning)*
|
||||
CNTK 206 Part A: :cntktut:`MNIST data preparation <CNTK_103A_MNIST_DataLoader>`,
|
||||
Part B: :cntktut:`Basic Generative Adversarial Networks (GAN) <CNTK_206A_Basic_GAN>`,
|
||||
Part B: :cntktut:`Deep Convolutional GAN <CNTK_206B_DCGAN>`
|
||||
CNTK 206 Part A: `MNIST data preparation <CNTK_103A_MNIST_DataLoader.html>`_ (:cntktut:`source <CNTK_103A_MNIST_DataLoader>`),
|
||||
Part B: `Basic Generative Adversarial Networks (GAN) <CNTK_206A_Basic_GAN.html>`_ (:cntktut:`source <CNTK_206A_Basic_GAN>`),
|
||||
Part C: `Deep Convolutional GAN <CNTK_206B_DCGAN.html>`_ (:cntktut:`source <CNTK_206B_DCGAN>`)
|
||||
|
||||
#. *Training with Sampled Softmax*
|
||||
CNTK 207: :cntktut:`Training with Sampled Softmax <CNTK_207_Training_with_Sampled_Softmax>`
|
||||
CNTK 207: `Training with Sampled Softmax <CNTK_207_Training_with_Sampled_Softmax.html>`_ (:cntktut:`source <CNTK_207_Training_with_Sampled_Softmax>`)
|
||||
|
||||
#. *Training with Connectionist Temporal Classification*
|
||||
CNTK 208: :cntktut:`Training with Connectionist Temporal Classification <CNTK_208_Speech_Connectionist_Temporal_Classification>`
|
||||
CNTK 208: `Training with Connectionist Temporal Classification <CNTK_208_Speech_Connectionist_Temporal_Classification.html>`_ (:cntktut:`source <CNTK_208_Speech_Connectionist_Temporal_Classification>`)
|
||||
|
||||
#. *Recognize flowers and animals in natural scene images using deep transfer learning*
|
||||
CNTK 301: :cntktut:`Deep transfer learning with pre-trained ResNet model <CNTK_301_Image_Recognition_with_Deep_Transfer_Learning>`
|
||||
CNTK 301: `Deep transfer learning with pre-trained ResNet model <CNTK_301_Image_Recognition_with_Deep_Transfer_Learning.html>`_ (:cntktut:`source <CNTK_301_Image_Recognition_with_Deep_Transfer_Learning>`)
|
||||
|
||||
Try these notebooks pre-installed on `CNTK Azure Notebooks`_ for free.
|
||||
|
||||
|
@ -65,3 +65,11 @@ For our Japanese users, you can find some of the `tutorials in Japanese`_ (unsup
|
|||
|
||||
.. _`CNTK Azure Notebooks`: https://notebooks.azure.com/cntk/libraries/tutorials
|
||||
.. _`tutorials in Japanese`: https://notebooks.azure.com/library/cntkbeta2_ja
|
||||
|
||||
.. toctree::
|
||||
:glob:
|
||||
:maxdepth: 1
|
||||
:caption: List view
|
||||
:hidden:
|
||||
|
||||
CNTK_*
|
Загрузка…
Ссылка в новой задаче