341 строка
12 KiB
Plaintext
341 строка
12 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# CNTK 201A Part A: CIFAR-10 Data Loader\n",
|
|
"\n",
|
|
"This tutorial will show how to prepare image data sets for use with deep learning algorithms in CNTK. The CIFAR-10 dataset (http://www.cs.toronto.edu/~kriz/cifar.html) is a popular dataset for image classification, collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. It is a labeled subset of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset.\n",
|
|
"\n",
|
|
"The CIFAR-10 dataset is not included in the CNTK distribution but can be easily downloaded and converted to CNTK-supported format \n",
|
|
"\n",
|
|
"CNTK 201A tutorial is divided into two parts:\n",
|
|
"- Part A: Familiarizes you with the CIFAR-10 data and converts them into CNTK supported format. This data will be used later in the tutorial for image classification tasks.\n",
|
|
"- Part B: We will introduce image understanding tutorials.\n",
|
|
"\n",
|
|
"If you are curious about how well computers can perform on CIFAR-10 today, Rodrigo Benenson maintains a [blog](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#43494641522d3130) on the state-of-the-art performance of various algorithms.\n",
|
|
"\n",
|
|
"**Note**: Please be patient since downloading the data and pre-processing it can take 10-15 minutes depending on the network speed.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from __future__ import print_function # Use a function definition from future version (say 3.x from 2.7 interpreter)\n",
|
|
"\n",
|
|
"from PIL import Image\n",
|
|
"import getopt\n",
|
|
"import numpy as np\n",
|
|
"import pickle as cp\n",
|
|
"import os\n",
|
|
"import shutil\n",
|
|
"import struct\n",
|
|
"import sys\n",
|
|
"import tarfile\n",
|
|
"import xml.etree.cElementTree as et\n",
|
|
"import xml.dom.minidom\n",
|
|
"\n",
|
|
"try: \n",
|
|
" from urllib.request import urlretrieve \n",
|
|
"except ImportError: \n",
|
|
" from urllib import urlretrieve\n",
|
|
"\n",
|
|
"# Config matplotlib for inline plotting\n",
|
|
"%matplotlib inline"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Data download\n",
|
|
"\n",
|
|
"The CIFAR-10 dataset consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. \n",
|
|
"There are 50,000 training images and 10,000 test images. The 10 classes are: airplane, automobile, bird, \n",
|
|
"cat, deer, dog, frog, horse, ship, and truck."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# CIFAR Image data\n",
|
|
"imgSize = 32\n",
|
|
"numFeature = imgSize * imgSize * 3"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We first setup a few helper functions to download the CIFAR data. The archive contains the files data_batch_1, data_batch_2, ..., data_batch_5, as well as test_batch. Each of these files is a Python \"pickled\" object produced with cPickle. To prepare the input data for use in CNTK we use three oprations:\n",
|
|
"> `readBatch`: Unpack the pickle files\n",
|
|
"\n",
|
|
"> `loadData`: Compose the data into single train and test objects\n",
|
|
"\n",
|
|
"> `saveTxt`: As the name suggests, saves the label and the features into text files for both training and testing. \n",
|
|
" "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def readBatch(src):\n",
|
|
" with open(src, 'rb') as f:\n",
|
|
" if sys.version_info[0] < 3: \n",
|
|
" d = cp.load(f) \n",
|
|
" else:\n",
|
|
" d = cp.load(f, encoding='latin1')\n",
|
|
" data = d['data']\n",
|
|
" feat = data\n",
|
|
" res = np.hstack((feat, np.reshape(d['labels'], (len(d['labels']), 1))))\n",
|
|
" return res.astype(np.int)\n",
|
|
"\n",
|
|
"def loadData(src):\n",
|
|
" print ('Downloading ' + src)\n",
|
|
" fname, h = urlretrieve(src, './delete.me')\n",
|
|
" print ('Done.')\n",
|
|
" try:\n",
|
|
" print ('Extracting files...')\n",
|
|
" with tarfile.open(fname) as tar:\n",
|
|
" tar.extractall()\n",
|
|
" print ('Done.')\n",
|
|
" print ('Preparing train set...')\n",
|
|
" trn = np.empty((0, numFeature + 1), dtype=np.int)\n",
|
|
" for i in range(5):\n",
|
|
" batchName = './cifar-10-batches-py/data_batch_{0}'.format(i + 1)\n",
|
|
" trn = np.vstack((trn, readBatch(batchName)))\n",
|
|
" print ('Done.')\n",
|
|
" print ('Preparing test set...')\n",
|
|
" tst = readBatch('./cifar-10-batches-py/test_batch')\n",
|
|
" print ('Done.')\n",
|
|
" finally:\n",
|
|
" os.remove(fname)\n",
|
|
" return (trn, tst)\n",
|
|
"\n",
|
|
"def saveTxt(filename, ndarray):\n",
|
|
" with open(filename, 'w') as f:\n",
|
|
" labels = list(map(' '.join, np.eye(10, dtype=np.uint).astype(str)))\n",
|
|
" for row in ndarray:\n",
|
|
" row_str = row.astype(str)\n",
|
|
" label_str = labels[row[-1]]\n",
|
|
" feature_str = ' '.join(row_str[:-1])\n",
|
|
" f.write('|labels {} |features {}\\n'.format(label_str, feature_str))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"In addition to saving the images in the text format, we would save the images in PNG format. In addition we also compute the mean of the image. `saveImage` and `saveMean` are two functions used for this purpose."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def saveImage(fname, data, label, mapFile, regrFile, pad, **key_parms):\n",
|
|
" # data in CIFAR-10 dataset is in CHW format.\n",
|
|
" pixData = data.reshape((3, imgSize, imgSize))\n",
|
|
" if ('mean' in key_parms):\n",
|
|
" key_parms['mean'] += pixData\n",
|
|
"\n",
|
|
" if pad > 0:\n",
|
|
" pixData = np.pad(pixData, ((0, 0), (pad, pad), (pad, pad)), mode='constant', constant_values=128) \n",
|
|
"\n",
|
|
" img = Image.new('RGB', (imgSize + 2 * pad, imgSize + 2 * pad))\n",
|
|
" pixels = img.load()\n",
|
|
" for x in range(img.size[0]):\n",
|
|
" for y in range(img.size[1]):\n",
|
|
" pixels[x, y] = (pixData[0][y][x], pixData[1][y][x], pixData[2][y][x])\n",
|
|
" img.save(fname)\n",
|
|
" mapFile.write(\"%s\\t%d\\n\" % (fname, label))\n",
|
|
" \n",
|
|
" # compute per channel mean and store for regression example\n",
|
|
" channelMean = np.mean(pixData, axis=(1,2))\n",
|
|
" regrFile.write(\"|regrLabels\\t%f\\t%f\\t%f\\n\" % (channelMean[0]/255.0, channelMean[1]/255.0, channelMean[2]/255.0))\n",
|
|
" \n",
|
|
"def saveMean(fname, data):\n",
|
|
" root = et.Element('opencv_storage')\n",
|
|
" et.SubElement(root, 'Channel').text = '3'\n",
|
|
" et.SubElement(root, 'Row').text = str(imgSize)\n",
|
|
" et.SubElement(root, 'Col').text = str(imgSize)\n",
|
|
" meanImg = et.SubElement(root, 'MeanImg', type_id='opencv-matrix')\n",
|
|
" et.SubElement(meanImg, 'rows').text = '1'\n",
|
|
" et.SubElement(meanImg, 'cols').text = str(imgSize * imgSize * 3)\n",
|
|
" et.SubElement(meanImg, 'dt').text = 'f'\n",
|
|
" et.SubElement(meanImg, 'data').text = ' '.join(['%e' % n for n in np.reshape(data, (imgSize * imgSize * 3))])\n",
|
|
"\n",
|
|
" tree = et.ElementTree(root)\n",
|
|
" tree.write(fname)\n",
|
|
" x = xml.dom.minidom.parse(fname)\n",
|
|
" with open(fname, 'w') as f:\n",
|
|
" f.write(x.toprettyxml(indent = ' '))\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"`saveTrainImages` and `saveTestImages` are simple wrapper functions to iterate through the data set."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def saveTrainImages(filename, foldername):\n",
|
|
" if not os.path.exists(foldername):\n",
|
|
" os.makedirs(foldername)\n",
|
|
" data = {}\n",
|
|
" dataMean = np.zeros((3, imgSize, imgSize)) # mean is in CHW format.\n",
|
|
" with open('train_map.txt', 'w') as mapFile:\n",
|
|
" with open('train_regrLabels.txt', 'w') as regrFile:\n",
|
|
" for ifile in range(1, 6):\n",
|
|
" with open(os.path.join('./cifar-10-batches-py', 'data_batch_' + str(ifile)), 'rb') as f:\n",
|
|
" if sys.version_info[0] < 3: \n",
|
|
" data = cp.load(f)\n",
|
|
" else: \n",
|
|
" data = cp.load(f, encoding='latin1')\n",
|
|
" for i in range(10000):\n",
|
|
" fname = os.path.join(os.path.abspath(foldername), ('%05d.png' % (i + (ifile - 1) * 10000)))\n",
|
|
" saveImage(fname, data['data'][i, :], data['labels'][i], mapFile, regrFile, 4, mean=dataMean)\n",
|
|
" dataMean = dataMean / (50 * 1000)\n",
|
|
" saveMean('CIFAR-10_mean.xml', dataMean)\n",
|
|
"\n",
|
|
"def saveTestImages(filename, foldername):\n",
|
|
" if not os.path.exists(foldername):\n",
|
|
" os.makedirs(foldername)\n",
|
|
" with open('test_map.txt', 'w') as mapFile:\n",
|
|
" with open('test_regrLabels.txt', 'w') as regrFile:\n",
|
|
" with open(os.path.join('./cifar-10-batches-py', 'test_batch'), 'rb') as f:\n",
|
|
" if sys.version_info[0] < 3: \n",
|
|
" data = cp.load(f)\n",
|
|
" else: \n",
|
|
" data = cp.load(f, encoding='latin1')\n",
|
|
" for i in range(10000):\n",
|
|
" fname = os.path.join(os.path.abspath(foldername), ('%05d.png' % i))\n",
|
|
" saveImage(fname, data['data'][i, :], data['labels'][i], mapFile, regrFile, 0)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# URLs for the train image and labels data\n",
|
|
"url_cifar_data = 'http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz'\n",
|
|
"\n",
|
|
"# Paths for saving the text files\n",
|
|
"data_dir = './data/CIFAR-10/'\n",
|
|
"train_filename = data_dir + '/Train_cntk_text.txt'\n",
|
|
"test_filename = data_dir + '/Test_cntk_text.txt'\n",
|
|
"\n",
|
|
"train_img_directory = data_dir + '/Train'\n",
|
|
"test_img_directory = data_dir + '/Test'\n",
|
|
"\n",
|
|
"root_dir = os.getcwd()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Downloading http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz\n",
|
|
"Done.\n",
|
|
"Extracting files...\n",
|
|
"Done.\n",
|
|
"Preparing train set...\n",
|
|
"Done.\n",
|
|
"Preparing test set...\n",
|
|
"Done.\n",
|
|
"Writing train text file...\n",
|
|
"Done.\n",
|
|
"Writing test text file...\n",
|
|
"Done.\n",
|
|
"Converting train data to png images...\n",
|
|
"Done.\n",
|
|
"Converting test data to png images...\n",
|
|
"Done.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"if not os.path.exists(data_dir):\n",
|
|
" os.makedirs(data_dir)\n",
|
|
"\n",
|
|
"try:\n",
|
|
" os.chdir(data_dir) \n",
|
|
" trn, tst= loadData('http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz')\n",
|
|
" print ('Writing train text file...')\n",
|
|
" saveTxt(r'./Train_cntk_text.txt', trn)\n",
|
|
" print ('Done.')\n",
|
|
" print ('Writing test text file...')\n",
|
|
" saveTxt(r'./Test_cntk_text.txt', tst)\n",
|
|
" print ('Done.')\n",
|
|
" print ('Converting train data to png images...')\n",
|
|
" saveTrainImages(r'./Train_cntk_text.txt', 'train')\n",
|
|
" print ('Done.')\n",
|
|
" print ('Converting test data to png images...')\n",
|
|
" saveTestImages(r'./Test_cntk_text.txt', 'test')\n",
|
|
" print ('Done.')\n",
|
|
"finally:\n",
|
|
" os.chdir(\"../..\")"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"anaconda-cloud": {},
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.5.2"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 1
|
|
}
|