Integrate sayanpa/pycntk105 into master

This commit is contained in:
Project Philly 2017-01-13 16:50:04 -08:00
Родитель 82b924c6e9 bfd9829554
Коммит 1b0db71ad7
5 изменённых файлов: 922 добавлений и 8 удалений

Просмотреть файл

@ -0,0 +1,28 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license. See LICENSE.md file in the project root
# for full license information.
# ==============================================================================
import os
import re
import numpy as np
abs_path = os.path.dirname(os.path.abspath(__file__))
notebook = os.path.join(abs_path, "..", "..", "..", "..", "Tutorials", "CNTK_105_Basic_Autoencoder_for_Dimensionality_Reduction.ipynb")
TOLERANCE_ABSOLUTE = 1E-1
def test_cntk_105_basic_autoencoder_for_dimensionality_reduction_noErrors(nb):
errors = [output for cell in nb.cells if 'outputs' in cell
for output in cell['outputs'] if output.output_type == "error"]
print(errors)
assert errors == []
expectedError = 3.1
def test_cntk_105_basic_autoencoder_for_dimensionality_reduction_simple_trainerror(nb):
testCell = [cell for cell in nb.cells
if cell.cell_type == 'code' and re.search('# Simple autoencoder test error', cell.source)]
assert np.isclose(float((testCell[0].outputs[0])['text']), expectedError, atol = TOLERANCE_ABSOLUTE)

Просмотреть файл

@ -81,17 +81,17 @@
},
{
"category": ["Image"],
"name": "MNIST CNN OCR",
"url": "https://github.com/Microsoft/CNTK/wiki/Tutorial2",
"description": "Use CNN on an OCR problem.",
"language": ["Python", "BrainScript"],
"name": "MNIST Feed Forward OCR",
"url": "https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_103B_MNIST_FeedForwardNetwork.ipynb",
"description": "Use Feed Forward networks on an OCR problem.",
"language": ["Python"],
"type": ["Tutorial", "Recipe"]
},
{
"category": ["Image"],
"name": "MNIST Feed Forward OCR",
"url": "https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_103B_MNIST_FeedForwardNetwork.ipynb",
"description": "Use Feed Forward networks on an OCR problem.",
"name": "MNIST Autoencoder Dim Reduction",
"url": "https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_105_Basic_Autoencoder_for_Dimensionality_Reduction.ipynb",
"description": "Use Autoencoder for dimensionality reduction.",
"language": ["Python"],
"type": ["Tutorial", "Recipe"]
},

Просмотреть файл

@ -0,0 +1,879 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CNTK 105: Basic autoencoder with MNIST data\n",
"\n",
"**Prerequisites**: We assume that you have successfully downloaded the MNIST data by completing the tutorial titled CNTK_103A_MNIST_DataLoader.ipynb.\n",
"\n",
"\n",
"## Introduction\n",
"\n",
"In this tutorial we introduce you to the basics of [Autoencoders](https://en.wikipedia.org/wiki/Autoencoder). An autoencoder is an artificial neural network used for unsupervised learning of efficient encodings. In other words, they are used for lossy data-specific compression that is learnt automatically instead of relying on human engineered features. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction. \n",
"\n",
"The autoencoders are very specific to the data-set on hand and are different from standard codecs such as JPEG, MPEG standard based encodings. Once the information is encoded and decoded back to original dimensions some amount of information is lost in the process. Given these encodings are specific to data, autoencoders are not used for compression. However, there are two areas where autoencoders have been found very effective in denoising and dimensionality reduction.\n",
"\n",
"Autoencoders have attracted attention since they have long been thought to be a potential approach for unsupervised learning. Truly unsupervised approaches involve learning useful representations without the need for labels. Autoencoders fall under self-supervised learning, a specific instance of supervised learning where the targets are generated from the input data. \n",
"\n",
"**Goal** \n",
"\n",
"Our goal is to train an autoencoder that compresses MNIST digits image to a vector of smaller dimension and then restores the image. The MNIST data comprises of hand-written digits with little background noise.\n",
"\n",
"<img src=\"http://cntk.ai/jup/MNIST-image.jpg\", width=300, height=300>\n",
"\n",
"In this tutorial, we will use the [MNIST hand-written digits data](https://en.wikipedia.org/wiki/MNIST_database) to illustrate encoding the images and decoding (restoring) them using feed-forward networks. We will visualize the original and the restored images. We illustrate feed forward network based both simple autoencoder and deep autoencoder. More advanced autoencoders will be covered in future 200 series tutorials.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Import the relevant modules\n",
"from __future__ import print_function\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import os\n",
"import sys\n",
"\n",
"# Import CNTK related modules\n",
"import cntk as C\n",
"from cntk.blocks import default_options, Input # building blocks\n",
"from cntk.device import set_default_device, gpu, cpu, best \n",
"from cntk.layers import Dense\n",
"from cntk import Trainer, StreamConfiguration\n",
"from cntk.io import StreamDef, StreamDefs, INFINITELY_REPEAT, FULL_DATA_SWEEP\n",
"from cntk.io import MinibatchSource, CTFDeserializer\n",
"from cntk.initializer import glorot_uniform\n",
"from cntk.learner import adam_sgd, UnitType\n",
"from cntk.learner import learning_rate_schedule, momentum_as_time_constant_schedule\n",
"from cntk.utils import *\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Select the notebook runtime environment devices / settings\n",
"\n",
"Set the device to cpu / gpu for the test environment. If you have both CPU and GPU on your machine, you can optionally switch the devices. By default we choose the best available device."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Select the right target device when this notebook is being tested:\n",
"if 'TEST_DEVICE' in os.environ:\n",
" import cntk\n",
" if os.environ['TEST_DEVICE'] == 'cpu':\n",
" C.device.set_default_device(C.device.cpu())\n",
" else:\n",
" C.device.set_default_device(C.device.gpu(0))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two run modes:\n",
"- *Fast mode*: `isFast` is set to `True`. This is the default mode for the notebooks, which means we train for fewer iterations or train / test on limited data. This ensures functional correctness of the notebook though the models produced are far from what a completed training would produce.\n",
"\n",
"- *Slow mode*: We recommend the user to set this flag to `False` once the user has gained familiarity with the notebook content and wants to gain insight from running the notebooks for a longer period with different parameters for training. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"isFast = True"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data reading\n",
"\n",
"In this section, we will read the data generated in CNTK 103 Part A.\n",
"\n",
"The data is in the following format:\n",
"\n",
" |labels 0 0 0 0 0 0 0 1 0 0 |features 0 0 0 0 ... \n",
" (784 integers each representing a pixel)\n",
" \n",
" In this tutorial we are going to use the image pixels corresponding the integer stream named \"features\". We define a `create_reader` function to read the training and test data using the [CTF deserializer](https://cntk.ai/pythondocs/cntk.io.html?highlight=ctfdeserializer#cntk.io.CTFDeserializer). The labels are [1-hot encoded](https://en.wikipedia.org/wiki/One-hot). We ignore them in this tutorial. \n",
"\n",
"We also check if the training and test data file has been downloaded and available for reading by the `create_reader` function. In this tutorial we are using the MNIST data you have downloaded using CNTK_103A_MNIST_DataLoader notebook. The dataset has 60,000 training images and 10,000 test images with each image being 28 x 28 pixels."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Read a CTF formatted text (as mentioned above) using the CTF deserializer from a file\n",
"def create_reader(path, is_training, input_dim, num_label_classes):\n",
" return MinibatchSource(CTFDeserializer(path, StreamDefs(\n",
" labels_viz = StreamDef(field='labels', shape=num_label_classes, is_sparse=False),\n",
" features = StreamDef(field='features', shape=input_dim, is_sparse=False)\n",
" )), randomize = is_training, epoch_size = INFINITELY_REPEAT if is_training else FULL_DATA_SWEEP)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Ensure the training and test data is generated and available for this tutorial.\n",
"# We search in two locations in the toolkit for the cached MNIST data set.\n",
"data_found = False\n",
"for data_dir in [os.path.join(\"..\", \"Examples\", \"Image\", \"DataSets\", \"MNIST\"),\n",
" os.path.join(\"data\", \"MNIST\")]:\n",
" train_file = os.path.join(data_dir, \"Train-28x28_cntk_text.txt\")\n",
" test_file = os.path.join(data_dir, \"Test-28x28_cntk_text.txt\")\n",
" if os.path.isfile(train_file) and os.path.isfile(test_file):\n",
" data_found = True\n",
" break\n",
" \n",
"if not data_found:\n",
" raise ValueError(\"Please generate the data by completing CNTK 103 Part A\")\n",
"print(\"Data directory is {0}\".format(data_dir))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='#Model Creation'></a>\n",
"## Model Creation\n",
"\n",
"We start with a simple single fully-connected feedforward network as encoder and as decoder (as shown in the figure below):\n",
"\n",
"<img src=\"http://cntk.ai/jup/SimpleAEfig.jpg\",width=200, height=200>\n",
"\n",
"The input data are a set of hand written digits images each 28 x28 pixels. In this tutorial, we will consider each image as a linear array of 784 pixel values. These pixels are considered as an input having 784 dimensions, one per pixel. Since the goal of the autoencoder is to compress the data and reconstruct the original image, the output dimension is same as the input dimension. We will compress the input to mere 32 dimensions (referred to as the `encoding_dim`). Additionally, since the maximum input value is 255, we normalize the input between 0 and 1. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"input_dim = 784\n",
"encoding_dim = 32\n",
"output_dim = input_dim\n",
"\n",
"def create_model(features):\n",
" with default_options(init = glorot_uniform()):\n",
" # We scale the input pixels to 0-1 range\n",
" encode = Dense(encoding_dim, activation = C.relu)(features/255.0)\n",
" decode = Dense(input_dim, activation = C.sigmoid)(encode)\n",
"\n",
" return decode"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup the network for training and testing\n",
"\n",
"In previous tutorials, we have defined each of the training and testing phases separately. In this tutorial, we combine the two componets in one place such that this template could be used as a recipe for your usage. \n",
"\n",
"The `train_and_test` function performs two major tasks:\n",
"- Train the model\n",
"- Evaluate the accuracy of the model on test data\n",
"\n",
"For training:\n",
"\n",
"> The function takes a reader (`reader_train`), a model function (`model_func`) and the target (a.k.a `label`) as input. In this tutorial, we show how to create and pass your **own** loss function. We normalize the `label` function to emit value between 0 and 1 for us to compute the label error using `C.classification_error` function.\n",
"\n",
"> We use Adam optimizer in this tutorial from a range of [learners](https://www.cntk.ai/pythondocs/cntk.learner.html#module-cntk.learner) (optimizers) available in the toolkit. \n",
"\n",
"For testing:\n",
"\n",
"> The function additionally takes a reader (`reader_test`) and evaluates the predicted pixel values made by the model against reference data, in this case the original pixel values for each image.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def train_and_test(reader_train, reader_test, model_func):\n",
" \n",
" ###############################################\n",
" # Training the model\n",
" ###############################################\n",
" \n",
" # Instantiate the input and the label variables\n",
" input = Input(input_dim)\n",
" label = Input(input_dim)\n",
" \n",
" # Create the model function\n",
" model = model_func(input)\n",
" \n",
" # The labels for this network is same as the input MNIST image.\n",
" # Note: Inside the model we are scaling the input to 0-1 range\n",
" # Hence we rescale the label to the same range\n",
" # We show how one can use their custom loss function\n",
" # loss = -(y* log(p)+ (1-y) * log(1-p)) where p = model output and y = target\n",
" # We have normalized the input between 0-1. Hence we scale the target to same range\n",
" \n",
" target = label/255.0 \n",
" loss = -(target * C.log(model) + (1 - target) * C.log(1 - model))\n",
" label_error = C.classification_error(model, target)\n",
" \n",
" # training config\n",
" epoch_size = 30000 # 30000 samples is half the dataset size \n",
" minibatch_size = 64\n",
" num_sweeps_to_train_with = 5 if isFast else 100\n",
" num_samples_per_sweep = 60000\n",
" num_minibatches_to_train = (num_samples_per_sweep * num_sweeps_to_train_with) // minibatch_size\n",
" \n",
" \n",
" # Instantiate the trainer object to drive the model training\n",
" lr_per_sample = [0.00003]\n",
" lr_schedule = learning_rate_schedule(lr_per_sample, UnitType.sample, epoch_size)\n",
" \n",
" # Momentum\n",
" momentum_as_time_constant = momentum_as_time_constant_schedule(700)\n",
" \n",
" # We use a variant of the Adam optimizer which is known to work well on this dataset\n",
" # Feel free to try other optimizers from \n",
" # https://www.cntk.ai/pythondocs/cntk.learner.html#module-cntk.learner\n",
" learner = adam_sgd(model.parameters,\n",
" lr=lr_schedule, momentum=momentum_as_time_constant) \n",
" \n",
" # Instantiate the trainer\n",
" trainer = Trainer(model, loss, label_error, learner)\n",
" \n",
" # Map the data streams to the input and labels.\n",
" # Note: for autoencoders input == label\n",
" input_map = {\n",
" input : reader_train.streams.features,\n",
" label : reader_train.streams.features\n",
" } \n",
" \n",
" pp = ProgressPrinter(0)\n",
" for i in range(num_minibatches_to_train):\n",
" # Read a mini batch from the training data file\n",
" data = reader_train.next_minibatch(minibatch_size, input_map = input_map)\n",
" \n",
" # Run the trainer on and perform model training\n",
" trainer.train_minibatch(data) \n",
" pp.update_with_trainer(trainer, with_metric=True)\n",
" \n",
" train_error = pp.avg_metric_since_start()*100\n",
" print(\"Average training error: {0:0.2f}%\".format(pp.avg_metric_since_start()*100))\n",
" \n",
" #############################################################################\n",
" # Testing the model\n",
" # Note: we use a test file reader to read data different from a training data\n",
" #############################################################################\n",
" \n",
" # Test data for trained model\n",
" test_minibatch_size = 32\n",
" num_samples = 10000\n",
" num_minibatches_to_test = num_samples / test_minibatch_size\n",
" test_result = 0.0\n",
" \n",
" # Test error metric calculation\n",
" metric_numer = 0\n",
" metric_denom = 0\n",
"\n",
" test_input_map = {\n",
" input : reader_test.streams.features,\n",
" label : reader_test.streams.features\n",
" }\n",
"\n",
" for i in range(0, int(num_minibatches_to_test)):\n",
" \n",
" # We are loading test data in batches specified by test_minibatch_size\n",
" # Each data point in the minibatch is a MNIST digit image of 784 dimensions \n",
" # with one pixel per dimension that we will encode / decode with the \n",
" # trained model.\n",
" data = reader_test.next_minibatch(test_minibatch_size,\n",
" input_map = test_input_map)\n",
"\n",
" # Specify the mapping of input variables in the model to actual\n",
" # minibatch data to be tested with\n",
" eval_error = trainer.test_minibatch(data)\n",
" \n",
" # minibatch data to be trained with\n",
" metric_numer += np.abs(eval_error * test_minibatch_size)\n",
" metric_denom += test_minibatch_size\n",
"\n",
" # Average of evaluation errors of all test minibatches\n",
" test_error = (metric_numer*100.0) / (metric_denom) \n",
" print(\"Average test error: {0:0.2f}%\".format(test_error))\n",
" \n",
" return model, train_error, test_error"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us train the simple autoencoder. We create a training and a test reader"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"num_label_classes = 10\n",
"reader_train = create_reader(train_file, True, input_dim, num_label_classes)\n",
"reader_test = create_reader(test_file, False, input_dim, num_label_classes)\n",
"model, simple_ae_train_error, simple_ae_test_error = train_and_test(reader_train, \n",
" reader_test, \n",
" model_func = create_model )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Visualize the simple autoencoder results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Read some data to run the eval\n",
"num_label_classes = 10\n",
"reader_eval = create_reader(test_file, False, input_dim, num_label_classes)\n",
"\n",
"eval_minibatch_size = 50\n",
"eval_input_map = { input : reader_eval.streams.features } \n",
" \n",
"eval_data = reader_eval.next_minibatch(eval_minibatch_size,\n",
" input_map = eval_input_map)\n",
"\n",
"img_data = eval_data[input].value\n",
"\n",
"# Select a random image\n",
"np.random.seed(0) \n",
"idx = np.random.choice(eval_minibatch_size)\n",
"\n",
"orig_image = img_data[idx,:,:]\n",
"decoded_image = model.eval(orig_image)*255\n",
"\n",
"# Print image statistics\n",
"def print_image_stats(img, text):\n",
" print(text)\n",
" print(\"Max: {0:.2f}, Median: {1:.2f}, Mean: {2:.2f}, Min: {3:.2f}\".format(np.max(img),\n",
" np.median(img),\n",
" np.mean(img),\n",
" np.min(img))) \n",
" \n",
"# Print original image\n",
"print_image_stats(orig_image, \"Original image statistics:\")\n",
"\n",
"# Print decoded image\n",
"print_image_stats(decoded_image, \"Decoded image statistics:\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us plot the original and the decoded image. They should look visually similar."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Define a helper function to plot a pair of images\n",
"def plot_image_pair(img1, text1, img2, text2):\n",
" fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(6, 6))\n",
"\n",
" axes[0].imshow(img1, cmap=\"gray\")\n",
" axes[0].set_title(text1)\n",
" axes[0].axis(\"off\")\n",
"\n",
" axes[1].imshow(img2, cmap=\"gray\")\n",
" axes[1].set_title(text2)\n",
" axes[1].axis(\"off\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Plot the original and the decoded image\n",
"img1 = orig_image.reshape(28,28)\n",
"text1 = 'Original image'\n",
"\n",
"img2 = decoded_image.reshape(28,28)\n",
"text2 = 'Decoded image'\n",
"\n",
"plot_image_pair(img1, text1, img2, text2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Deep Auto encoder\n",
"\n",
"We do not have to limit ourselves to a single layer as encoder or decoder, we could instead use a stack of dense layers. Let us create a deep autoencoder.\n",
"\n",
"<img src=\"http://cntk.ai/jup/DeepAEfig.jpg\",width=500, height=300>\n",
"\n",
"The encoding dimensions are 128, 64 and 32 while the decoding dimensions are symmetrically opposite 64, 128 and 784. This increases the number of parameters used to model the transformation and achieves lower error rates at the cost of longer training duration and memory footprint. If we train this deep encoder for larger number iterations by turning the `isFast` flag to be `False`, we get a lower error and the reconstructed images are also marginally better. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"input_dim = 784\n",
"encoding_dims = [128,64,32]\n",
"decoding_dims = [64,128]\n",
"\n",
"encoded_model = None\n",
"\n",
"def create_deep_model(features):\n",
" with default_options(init = glorot_uniform()):\n",
" encode = C.element_times(C.constant(1.0/255.0), features)\n",
"\n",
" for encoding_dim in encoding_dims:\n",
" encode = Dense(encoding_dim, activation = C.relu)(encode)\n",
"\n",
" global encoded_model\n",
" encoded_model= encode\n",
" \n",
" decode = encode\n",
" for decoding_dim in decoding_dims:\n",
" decode = Dense(decoding_dim, activation = C.relu)(decode)\n",
"\n",
" decode = Dense(input_dim, activation = C.sigmoid)(decode)\n",
" return decode "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"num_label_classes = 10\n",
"reader_train = create_reader(train_file, True, input_dim, num_label_classes)\n",
"reader_test = create_reader(test_file, False, input_dim, num_label_classes)\n",
"\n",
"model, deep_ae_train_error, deep_ae_test_error = train_and_test(reader_train, \n",
" reader_test, \n",
" model_func = create_deep_model) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Visualize the deep autoencoder results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Run the same image as the simple autoencoder through the deep encoder\n",
"orig_image = img_data[idx,:,:]\n",
"decoded_image = model.eval(orig_image)*255\n",
"\n",
"# Print image statistics\n",
"def print_image_stats(img, text):\n",
" print(text)\n",
" print(\"Max: {0:.2f}, Median: {1:.2f}, Mean: {2:.2f}, Min: {3:.2f}\".format(np.max(img),\n",
" np.median(img),\n",
" np.mean(img),\n",
" np.min(img))) \n",
" \n",
"# Print original image\n",
"print_image_stats(orig_image, \"Original image statistics:\")\n",
"\n",
"# Print decoded image\n",
"print_image_stats(decoded_image, \"Decoded image statistics:\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us plot the original and the decoded image with the deep autoencoder. They should look visually similar."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Plot the original and the decoded image\n",
"img1 = orig_image.reshape(28,28)\n",
"text1 = 'Original image'\n",
"\n",
"img2 = decoded_image.reshape(28,28)\n",
"text2 = 'Decoded image'\n",
"\n",
"plot_image_pair(img1, text1, img2, text2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have shown how to encode and decode an input. In this section we will explore how we can compare one to another and also show how to extract an encoded input for a given input. For visualizing high dimension data in 2D, [t-SNE](http://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) is probably one of the best methods. However, it typically requires relatively low-dimensional data. So a good strategy for visualizing similarity relationships in high-dimensional data is to encode data into a low-dimensional space (e.g. 32 dimensional) using an autoencoder first, extract the encoding of the input data followed by using t-SNE for mapping the compressed data to a 2D plane. \n",
"\n",
"We will use the deep autoencoder outputs to:\n",
"- Compare two images and\n",
"- Show how we can retrieve an encoded (compressed) data. \n",
"\n",
"First we need to read some image data along with their labels. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Read some data to run get the image data and the corresponding labels\n",
"num_label_classes = 10\n",
"reader_viz = create_reader(test_file, False, input_dim, num_label_classes)\n",
"\n",
"image = Input(input_dim)\n",
"image_label = Input(num_label_classes)\n",
"\n",
"viz_minibatch_size = 50\n",
"\n",
"viz_input_map = { \n",
" image : reader_viz.streams.features, \n",
" image_label : reader_viz.streams.labels_viz \n",
"} \n",
" \n",
"viz_data = reader_eval.next_minibatch(viz_minibatch_size,\n",
" input_map = viz_input_map)\n",
"\n",
"img_data = viz_data[image].value\n",
"imglabel_raw = viz_data[image_label].value"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Map the image labels into indices in minibatch array\n",
"img_labels = [np.argmax(imglabel_raw[i,:,:]) for i in range(0, imglabel_raw.shape[0])] \n",
" \n",
"from collections import defaultdict\n",
"label_dict=defaultdict(list)\n",
"for img_idx, img_label, in enumerate(img_labels):\n",
" label_dict[img_label].append(img_idx) \n",
" \n",
"# Print indices corresponding to 3 digits\n",
"randIdx = [1, 3, 9]\n",
"for i in randIdx:\n",
" print(\"{0}: {1}\".format(i, label_dict[i]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will [compute cosine distance](https://en.wikipedia.org/wiki/Cosine_similarity) between two images using `scipy`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from scipy import spatial\n",
"\n",
"def image_pair_cosine_distance(img1, img2):\n",
" if img1.size != img2.size:\n",
" raise ValueError(\"Two images need to be of same dimension\")\n",
" return 1 - spatial.distance.cosine(img1, img2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Let s compute the distance between two images of the same number\n",
"digit_of_interest = 6\n",
"\n",
"digit_index_list = label_dict[digit_of_interest]\n",
"\n",
"if len(digit_index_list) < 2:\n",
" print(\"Need at least two images to compare\")\n",
"else:\n",
" imgA = img_data[digit_index_list[0],:,:][0] \n",
" imgB = img_data[digit_index_list[1],:,:][0]\n",
" \n",
" # Print distance between original image\n",
" imgA_B_dist = image_pair_cosine_distance(imgA, imgB)\n",
" print(\"Distance between two original image: {0:.3f}\".format(imgA_B_dist))\n",
" \n",
" # Plot the two images\n",
" img1 = imgA.reshape(28,28)\n",
" text1 = 'Original image 1'\n",
"\n",
" img2 = imgB.reshape(28,28)\n",
" text2 = 'Original image 2'\n",
"\n",
" plot_image_pair(img1, text1, img2, text2)\n",
" \n",
" # Decode the encoded stream \n",
" imgA_decoded = model.eval([imgA])\n",
" imgB_decoded = model.eval([imgB]) \n",
" imgA_B_decoded_dist = image_pair_cosine_distance(imgA_decoded, imgB_decoded)\n",
"\n",
" # Print distance between original image\n",
" print(\"Distance between two decoded image: {0:.3f}\".format(imgA_B_decoded_dist))\n",
" \n",
" # Plot the two images\n",
" # Plot the original and the decoded image\n",
" img1 = imgA_decoded.reshape(28,28)\n",
" text1 = 'Decoded image 1'\n",
"\n",
" img2 = imgB_decoded.reshape(28,28)\n",
" text2 = 'Decoded image 2'\n",
"\n",
" plot_image_pair(img1, text1, img2, text2)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: The cosine distance between the original images comparable to the distance between the corresponding decoded images. A value of 1 indicates high similarity between the images and 0 indicates no similarity.\n",
"\n",
"Let us now see how to get the encoded vector corresponding to an input image. This should have the dimension of the choke point in the network shown in the figure with the box labeled `E`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"imgA = img_data[digit_index_list[0],:,:][0] \n",
"imgA_encoded = encoded_model.eval([imgA])\n",
"\n",
"print(\"Length of the original image is {0:3d} and the encoded image is {1:3d}\".format(len(imgA), \n",
" len(imgA_encoded[0][0])))\n",
"print(\"\\nThe encoded image: \")\n",
"print(imgA_encoded[0][0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us compare the distance between different digits."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"digitA = 3\n",
"digitB = 8\n",
"\n",
"digitA_index = label_dict[digitA]\n",
"digitB_index = label_dict[digitB]\n",
"\n",
"imgA = img_data[digitA_index[0],:,:][0] \n",
"imgB = img_data[digitB_index[0],:,:][0]\n",
"\n",
"# Print distance between original image\n",
"imgA_B_dist = image_pair_cosine_distance(imgA, imgB)\n",
"print(\"Distance between two original image: {0:.3f}\".format(imgA_B_dist))\n",
" \n",
"# Plot the two images\n",
"img1 = imgA.reshape(28,28)\n",
"text1 = 'Original image 1'\n",
"\n",
"img2 = imgB.reshape(28,28)\n",
"text2 = 'Original image 2'\n",
"\n",
"plot_image_pair(img1, text1, img2, text2)\n",
" \n",
"# Decode the encoded stream \n",
"imgA_decoded = model.eval([imgA])\n",
"imgB_decoded = model.eval([imgB]) \n",
"imgA_B_decoded_dist = image_pair_cosine_distance(imgA_decoded, imgB_decoded)\n",
"\n",
"#Print distance between original image\n",
"print(\"Distance between two decoded image: {0:.3f}\".format(imgA_B_decoded_dist))\n",
"\n",
"# Plot the original and the decoded image\n",
"img1 = imgA_decoded.reshape(28,28)\n",
"text1 = 'Decoded image 1'\n",
"\n",
"img2 = imgB_decoded.reshape(28,28)\n",
"text2 = 'Decoded image 2'\n",
"\n",
"plot_image_pair(img1, text1, img2, text2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Print the results of the deep encoder test error for regression testing"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Simple autoencoder test error\n",
"print(simple_ae_test_error)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Deep autoencoder test error\n",
"print(deep_ae_test_error)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Suggested tasks\n",
"\n",
"- Try different activation functions.\n",
"- Find which images are more similar to one another (a) using original image and (b) decoded image.\n",
"- Try using mean square error as the loss function. Does it improve the performance of the encoder in terms of reduced errors.\n",
"- Can you try different network structure to reduce the error further. Explain your observations.\n",
"- Can you use a different distance metric to compute similarity between the MNIST images.\n",
"- Try a deep encoder with [1000, 500, 250, 128, 64, 32]. What is the training error for same number of iterations? "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.5"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

Просмотреть файл

@ -13,7 +13,7 @@ implementation of computational networks that supports both CPU and GPU.
This page describes the Python API for CNTK_ version 2.0.beta7.0. This is an ongoing effort
to expose such an API to the CNTK system, thus enabling the use of higher-level
tools such as IDEs to facilitate the definition of computational networks, to execute
them on sample data in real time.
them on sample data in real time. Please give feedback through these `channels`_.
.. toctree::
:maxdepth: 2
@ -34,3 +34,4 @@ Indices and tables
* :ref:`modindex`
* :ref:`search`
.. _`channels`: https://github.com/Microsoft/CNTK/wiki/Feedback-Channels

Просмотреть файл

@ -8,6 +8,10 @@ Tutorials
* Part A: `MNIST data preparation`_
* Part B: `Feed Forward classifier`_
#. CNTK 104: `Time Series basics`_ with Finance data
#. CNTK 105: Autoencoder for dimensionality reduction with MNIST data
* Part A: `MNIST data preparation`_
* Part B: `Feed Forward autoencoder`_
#. CNTK 201: Image classifiers with CIFAR-10 data
@ -29,6 +33,8 @@ For our Japanese users, you can find some of the `tutorials in Japanese`_.
.. _`MNIST data preparation`: https://github.com/Microsoft/CNTK/tree/v2.0.beta7.0/Tutorials/CNTK_103A_MNIST_DataLoader.ipynb
.. _`Feed Forward classifier`: https://github.com/Microsoft/CNTK/tree/v2.0.beta7.0/Tutorials/CNTK_103B_MNIST_FeedForwardNetwork.ipynb
.. _`Time Series basics`: https://github.com/Microsoft/CNTK/tree/v2.0.beta7.0/Tutorials/CNTK_104_Finance_Timeseries_Basic_with_Pandas_Numpy.ipynb
.. _`Feed Forward autoencoder`: https://github.com/Microsoft/CNTK/tree/v2.0.beta7.0/Tutorials/CNTK_105_Basic_Autoencoder_for_Dimensionality_Reduction.ipynb
.. _`CIFAR-10 Data preparation`: https://github.com/Microsoft/CNTK/tree/v2.0.beta7.0/Tutorials/CNTK_201A_CIFAR-10_DataLoader.ipynb
.. _`VGG and ResNet classifiers`: https://github.com/Microsoft/CNTK/tree/v2.0.beta7.0/Tutorials/CNTK_201B_CIFAR-10_ImageHandsOn.ipynb
.. _`Language understanding`: https://github.com/Microsoft/CNTK/blob/v2.0.beta7.0/Tutorials/CNTK_202_Language_Understanding.ipynb