CNTK now supports CUDA 9/cuDNN 7. This requires an update to build environment to Ubuntu 16/GCC 5 for Linux, and Visual Studio 2017/VCTools 14.11 for Windows. With CUDA 9, CNTK also added a preview for 16-bit floating point (a.k.a FP16) computation.
Please check out the example of FP16 in ResNet50 at /Examples/Image/Classification/ResNet/Python/TrainResNet_ImageNet_Distributed.py
Notes on FP16 preview:
* FP16 implementation on CPU is not optimized, and it's not supposed to be used in CPU inference directly. User needs to convert the model to 32-bit floating point before running on CPU.
* Loss/Criterion for FP16 training needs to be 32bit for accumulation without overflow, using cast function. Please check the example above.
* Readers do not have FP16 output unless using numpy to feed data, cast from FP32 to FP16 is needed. Please check the example above.
* FP16 gradient aggregation is currently only implemented on GPU using NCCL2. Distributed training with FP16 with MPI is not supported.
* FP16 math is a subset of current FP32 implementation. Some model may get Feature Not Implemented exception using FP16.
* FP16 is currently not supported in BrainScript. Please use Python for FP16.
To setup build and runtime environment on Windows:
* Install [Visual Studio 2017](https://www.visualstudio.com/downloads/) with following workloads and components. From command line (use Community version installer as example):
vs_community.exe --add Microsoft.VisualStudio.Workload.NativeDesktop --add Microsoft.VisualStudio.Workload.ManagedDesktop --add Microsoft.VisualStudio.Workload.Universal --add Microsoft.Component.PythonTools --add Microsoft.VisualStudio.Component.VC.Tools.14.11
* Install [NVidia CUDA 9](https://developer.nvidia.com/cuda-90-download-archive?target_os=Windows&target_arch=x86_64)
* From PowerShell, run:
/Tools/devInstall/Windows/DevInstall.ps1
* Start VCTools 14.11 command line, run:
cmd /k "%VS2017INSTALLDIR%\VC\Auxiliary\Build\vcvarsall.bat" x64 --vcvars_ver=14.11
* Open /CNTK.sln from the VCTools 14.11 command line. Note that starting CNTK.sln other than VCTools 14.11 command line, would causes CUDA 9 [build error](https://developercommunity.visualstudio.com/content/problem/163758/vs-2017-155-doesnt-support-cuda-9.html).
To setup build and runtime environment on Linux using docker, please build Unbuntu 16.04 docker image using Dockerfiles /Tools/docker. For other Linux systems, please refer to the Dockerfiles to setup dependent libraries for CNTK.
remove hard tabs; add types in .gitattributes
update test, update baseline
use 01_OneHidden.model instead of 01_OneHidden.
update using CNTKOPENBLAS version 2
add a comment for using multiple inputs in C# example
improve project settings
fix settigns
adding some doc
use $$
add comments
automatically loadLibrary in java
semicolon
removing .iml
add more dependencies
move java static block code to cntk_java.i
remove also class files.
Adding .java files to jar
changing test location
typo
ignore DeviceDescriptorVector size constructor
DRY in Main.java
use List interface instead of ArrayList implementation
moving test location, expanding tests to gpu, fixing comments
move linux java tests
updating baseline.txt
Add arenas for allocating protobuf messages.
Add Save/Load methods to Dictionary/DictionaryValue.
Add error checking to GetFStream (barf on non-existent files).
Add unit tests to verify backward/forward compat.
Note: WriteScaledLogLike produces results of varying checksum on
Windows, where our hardware varies greater. If we see similar effects
on Linux or WriteBottleneck as well, we need to rework.