* Move to support CUDA 10, cudnn 7.3, cub 1.8.
* Fixed a bug related to "pointer to pin pointer is disallowed" #3063,
which is exposed in newer version vctools.
* Added workaround for a potential vs2017 15.9 bug with cntk Debug
version.
Unlike Linux-like cp instruction, docker COPY instruction doesn't
preserve the top-level dir structure (e.g. it copies the contents
of the top-level dir rather than the dir itelse to the image).
Our Dockerfile for generating GPU image has correct behavior though.
This will help to build CNTK for nighlty builds with required CNTK
version. If environment variable 'BUILD_CNTK_VERSION' is set, then CNTK
will be build for that version as a public release('+' won't be appended
to CNTK version). Otherwise hard-coded CNTK version will be used as a
private build (2.4+).
Depending on the CNTK setup steps followed, either cntk-py<version> or
<cntkdev-py<version> is installed. cntk-py<version> will be adopted as
the default from now on.
CNTK now supports CUDA 9/cuDNN 7. This requires an update to build environment to Ubuntu 16/GCC 5 for Linux, and Visual Studio 2017/VCTools 14.11 for Windows. With CUDA 9, CNTK also added a preview for 16-bit floating point (a.k.a FP16) computation.
Please check out the example of FP16 in ResNet50 at /Examples/Image/Classification/ResNet/Python/TrainResNet_ImageNet_Distributed.py
Notes on FP16 preview:
* FP16 implementation on CPU is not optimized, and it's not supposed to be used in CPU inference directly. User needs to convert the model to 32-bit floating point before running on CPU.
* Loss/Criterion for FP16 training needs to be 32bit for accumulation without overflow, using cast function. Please check the example above.
* Readers do not have FP16 output unless using numpy to feed data, cast from FP32 to FP16 is needed. Please check the example above.
* FP16 gradient aggregation is currently only implemented on GPU using NCCL2. Distributed training with FP16 with MPI is not supported.
* FP16 math is a subset of current FP32 implementation. Some model may get Feature Not Implemented exception using FP16.
* FP16 is currently not supported in BrainScript. Please use Python for FP16.
To setup build and runtime environment on Windows:
* Install [Visual Studio 2017](https://www.visualstudio.com/downloads/) with following workloads and components. From command line (use Community version installer as example):
vs_community.exe --add Microsoft.VisualStudio.Workload.NativeDesktop --add Microsoft.VisualStudio.Workload.ManagedDesktop --add Microsoft.VisualStudio.Workload.Universal --add Microsoft.Component.PythonTools --add Microsoft.VisualStudio.Component.VC.Tools.14.11
* Install [NVidia CUDA 9](https://developer.nvidia.com/cuda-90-download-archive?target_os=Windows&target_arch=x86_64)
* From PowerShell, run:
/Tools/devInstall/Windows/DevInstall.ps1
* Start VCTools 14.11 command line, run:
cmd /k "%VS2017INSTALLDIR%\VC\Auxiliary\Build\vcvarsall.bat" x64 --vcvars_ver=14.11
* Open /CNTK.sln from the VCTools 14.11 command line. Note that starting CNTK.sln other than VCTools 14.11 command line, would causes CUDA 9 [build error](https://developercommunity.visualstudio.com/content/problem/163758/vs-2017-155-doesnt-support-cuda-9.html).
To setup build and runtime environment on Linux using docker, please build Unbuntu 16.04 docker image using Dockerfiles /Tools/docker. For other Linux systems, please refer to the Dockerfiles to setup dependent libraries for CNTK.
On Linux:
sudo mkdir /usr/local/mklml
sudo wget https://github.com/01org/mkl-dnn/releases/download/v0.11/mklml_lnx_2018.0.1.20171007.tgz
sudo tar -xzf mklml_lnx_2018.0.1.20171007.tgz -C /usr/local/mklml
On Windows:
Create a directory on your machine to hold MKLML, e.g. mkdir c:\local\mklml
Download the file [mklml_win_2018.0.1.20171007.zip](https://github.com/01org/mkl-dnn/releases/download/v0.11/mklml_win_2018.0.1.20171007.zip).
Unzip it into your MKLML path, creating a versioned sub directory within.
Set the environment variable `MKLML_PATH` to the versioned sub directory, e.g. setx MKLML_PATH c:\local\mklml\mklml_win_2018.0.1.20171007
This change also enables CPU convolution forward/backward using MKL, which leads to ~4x speedup in AlexNet training.