CNTK now supports CUDA 9/cuDNN 7. This requires an update to build environment to Ubuntu 16/GCC 5 for Linux, and Visual Studio 2017/VCTools 14.11 for Windows. With CUDA 9, CNTK also added a preview for 16-bit floating point (a.k.a FP16) computation.
Please check out the example of FP16 in ResNet50 at /Examples/Image/Classification/ResNet/Python/TrainResNet_ImageNet_Distributed.py
Notes on FP16 preview:
* FP16 implementation on CPU is not optimized, and it's not supposed to be used in CPU inference directly. User needs to convert the model to 32-bit floating point before running on CPU.
* Loss/Criterion for FP16 training needs to be 32bit for accumulation without overflow, using cast function. Please check the example above.
* Readers do not have FP16 output unless using numpy to feed data, cast from FP32 to FP16 is needed. Please check the example above.
* FP16 gradient aggregation is currently only implemented on GPU using NCCL2. Distributed training with FP16 with MPI is not supported.
* FP16 math is a subset of current FP32 implementation. Some model may get Feature Not Implemented exception using FP16.
* FP16 is currently not supported in BrainScript. Please use Python for FP16.
To setup build and runtime environment on Windows:
* Install [Visual Studio 2017](https://www.visualstudio.com/downloads/) with following workloads and components. From command line (use Community version installer as example):
vs_community.exe --add Microsoft.VisualStudio.Workload.NativeDesktop --add Microsoft.VisualStudio.Workload.ManagedDesktop --add Microsoft.VisualStudio.Workload.Universal --add Microsoft.Component.PythonTools --add Microsoft.VisualStudio.Component.VC.Tools.14.11
* Install [NVidia CUDA 9](https://developer.nvidia.com/cuda-90-download-archive?target_os=Windows&target_arch=x86_64)
* From PowerShell, run:
/Tools/devInstall/Windows/DevInstall.ps1
* Start VCTools 14.11 command line, run:
cmd /k "%VS2017INSTALLDIR%\VC\Auxiliary\Build\vcvarsall.bat" x64 --vcvars_ver=14.11
* Open /CNTK.sln from the VCTools 14.11 command line. Note that starting CNTK.sln other than VCTools 14.11 command line, would causes CUDA 9 [build error](https://developercommunity.visualstudio.com/content/problem/163758/vs-2017-155-doesnt-support-cuda-9.html).
To setup build and runtime environment on Linux using docker, please build Unbuntu 16.04 docker image using Dockerfiles /Tools/docker. For other Linux systems, please refer to the Dockerfiles to setup dependent libraries for CNTK.
commit aecc380d21e04e803d683e25af2aac42c1a90125
Author: Manik Jindal <manikj@microsoft.com>
Date: Thu Nov 16 15:03:13 2017 -0800
Remove OpenCV dependency from CNTK core
Tensorboard's Image feature has a hard dependency on OpenCV and
Tensorboard is a part of CNTK core. Removing this hard dependency
by creating a new DLL ImageWriter just to write an image an PNG.
On Linux:
sudo mkdir /usr/local/mklml
sudo wget https://github.com/01org/mkl-dnn/releases/download/v0.11/mklml_lnx_2018.0.1.20171007.tgz
sudo tar -xzf mklml_lnx_2018.0.1.20171007.tgz -C /usr/local/mklml
On Windows:
Create a directory on your machine to hold MKLML, e.g. mkdir c:\local\mklml
Download the file [mklml_win_2018.0.1.20171007.zip](https://github.com/01org/mkl-dnn/releases/download/v0.11/mklml_win_2018.0.1.20171007.zip).
Unzip it into your MKLML path, creating a versioned sub directory within.
Set the environment variable `MKLML_PATH` to the versioned sub directory, e.g. setx MKLML_PATH c:\local\mklml\mklml_win_2018.0.1.20171007
This change also enables CPU convolution forward/backward using MKL, which leads to ~4x speedup in AlexNet training.
minor fixes
fixing Native utils path, loader, and the Native Load Utils
remove path and classpath setting in tests
cleaning up native utils
Unified CNTKNativeUtils class. Changed the code generation to use CNTKNativeUtils directly instead of the intermediary CNTK.init().
adding fixes to post bild
Added NATIVE_LOAD_MANIFEST for the cases when only specific high-level libraries should be loaded
linux side
Add gpu support
linux gpu .so copying
C#; and various fixes
Details:
fix wrong link
use /Zo option instead of undocumented /d2Zi+; remove /d2Zi+ for debug build;
move EvalMultithreads.cpp into Tests/EndToEndTests/EvalClientsTests/CNTKLibraryCPPEvalExamplesTests as preparation for the new CPP examples
add new CNTKLibraryCPPEvalExamples
use stdout
add first sample in CPPEvalExamples
add parallel evaluation examples
complete cpp examples
adapt linux build
add CPP tests to UWP
enable more tests on UWP
add running new tests; update baseline
add new cpp exampels to CNTKLibraryExamples.sln; use stdout instead of stderr in legacy eval dll; add missing cs examples
add file types for UWP
flush stdout to make cygwin happy
update baseline file for CPPUWPEval
update baseline for CNTKLibraryCSEvalExamplesTest
* Refactor index data structures and rewrite indexers (with most changes
in the text index builder).
* Add best effort caching: the cache is written out asynchronously in a
separate thread, on restart the index builder tries to restore the
index from cache (as long as the cache is not older than the input
file) and goes back no normal indexing if that fails (i.e., the cache
is corrupt).
* Refactor and simplify MemoryBuffer (renamed to BufferedFileReader).
* Use KMP patter-matching to simply sample counting with non-empty main
stream (num samples in sequence = number of lines that contain main
stream name).
* Refactor and simplify MLFIndexBuilder (it now also uses
BufferedFileReader)
* Use 512KB chunks when loading index from cache for faster reading.
* Add a number of unit tests for the indexing both with and without
caching.
adding some doc
use $$
add comments
automatically loadLibrary in java
semicolon
removing .iml
add more dependencies
move java static block code to cntk_java.i
remove also class files.
Adding .java files to jar
changing test location
typo
ignore DeviceDescriptorVector size constructor
DRY in Main.java
use List interface instead of ArrayList implementation
moving test location, expanding tests to gpu, fixing comments
move linux java tests
updating baseline.txt