* Support SequenceIs[First,Last] with ConstantOfShape
* Update bypass load test in verify\_one\_input & add test for one hot
op
* Update export for one hot op. Migrate from exporting
onnx.ml.OneHotEncoder to onnx.OneHot.
Op fixes
* Fix topk onnx\_op\_test
* Support MVN export using ONNX function
* Fix LayerNormalization
* Skip tests for sequence slice float16: not supported in cntk
* Support gather export & import with float16
- Fix cntk gather issue with float16 inputs.
- Support exporting constant float16 tensor.
- Support importing int32 indices input for gather.
* Enable more passed op tests
* A few patches are required to build cntk_uwp.
* Use proto from onnxruntime/protobuf instead of from onnx.
* TODO: Some issues with onnx_op_test RNN and OptimizedRNNStack from shape inference.
* Move to support CUDA 10, cudnn 7.3, cub 1.8.
* Fixed a bug related to "pointer to pin pointer is disallowed" #3063,
which is exposed in newer version vctools.
* Added workaround for a potential vs2017 15.9 bug with cntk Debug
version.
In case other projects may use these header files, we added
them into API/Internals.
* ComputationGraphAlgorithms.h was moved from Source/ComputationNetworkLib
* PrimitiveOpType.h and EvaluatorWrapper.h were moved from Source/CNTKv2Library
* PrimitiveFunctionAttribute.h was extracted from PrimitiveFunction.h. It contains
a new class PrimitiveFunctionAttribute which is the collection of all attribute
names for PrimitiveFunction.
This change actually had a subtle side-effect. We had a global static variable
s_stateAttributes that depended on PrimitiveFunction::AttributeNameRngSeed and
PrimitiveFunction::AttributeNameRngOffset. After we moved those static
attribute-variables into another translation unit, s_stateAttributes can be
initialized with empty wstring, because PrimitiveFunctionAttribute::AttributeNameRngSeed
PrimitiveFunctionAttribute::AttributeNameRngSeedOffset were initialized after
s_stateAttributes. Note that the initialization order of global static variables
is not well-defined cross translation units. To fix the issue, we also moved
s_stateAttributes into PrimitiveFunctionAttribute class, and renamed it to
s_rngStateAttributes. I think it's reasonable to consider s_rngStateAttributes
to be part of the PrimitiveFunctionAttribute class.
* PrimitiveFunction.h was moved from Source/CNTKv2Library
adding convolution over sequential axis related tests.
adding convolution over sequential axis.
currently additional supported parameters:
auto padding
strides
groups
support for dilation needs to be tested on GPU.
updating PrimitiveOpType SerializationTests that is missing from other commits ..
convert tabs to spaces.
Refine cpp convolution unit tests. Add dilation tests to python convolution unit tests.
more detailed comments on shape change for 1d seq conv with reduction rank 0. And other minor tweaks.
add EndToEndTests of sequential convolution on MNIST
add init_bias tests for seq conv
minor change in comments
rename ConvolutionOverSequenceAxisNode. Add comment on cudnn failed new test.
add more comments, trim spaces
add more comments, remove magic number, add more boundary checks.
remove the last SetValue for outputSeqAxisDimValue as TensorView Unary Op has already updated the value.
fix bug in python seqconv default bias shape, and add related unit tests.
small tweak in seq conv to avoid additional gpu memory allocation and increase performance.
Example: seq MNIST, and profiling
adjust conv c++ value unit test channel size.
small update on python seq mnist
Sequential convolution v2.
* re-designed ConvolutionSequenceShapeNode: refactored to separate out computing output sequence length from v1 node design. And refactored ConvolutionNodeBaseExtended as their common base class. (Since "ConvolutionNodeBase" is not only base class of ConvolutionNode but also PoolingNode).
* Performance increase against v1.
- compute sequence length by MBLayout instead of mask output from unpack. Avoiding the unnecessary cpu/gpu memory copy.
not include py sequence example for now .. need to find they a correct location.
add check for truncated sequences in sequential convolution
improve code style.
Moving sequential convolution in python to a new high level api, to maintain compatibility with previous implementation (special case 1d sequential convolution).
Add ConvolutionSequenceShape OP.
nit
update conv_attribute test for updated convolution parameter
move sequential parameter to the last
update test shortcircuit for CPU convolution dilation.
update endtoendtest - unittest baseline file for new convolution unittests.
update makefile to include new unittest file for linux
nit
Update ConvolutionNode initialization code to handle TransformerNode Initialization.
nit
nit
commit 02eb64bf5e9f6c22138b5111f5518f6087cea7e0
Author: TJ <tix@microsoft.com>
Date: Mon Jul 9 13:07:13 2018 -0700
set the multiverso lib file when asgd is set to true, otherwise it will
look for a lib that doesn't match any rule when asgd=false
We used to set CXX to mpic++, which was then used for building
Multiverso. Unfortunately, this kind of configuration conflicted
with find_package(MPI), and hence caused failure.
This commit fixed the issue by using system CXX for building
Multiverso. It also fixed two other issues:
* EVAL_LIB depends on libmultiverso, so we had to set
MULTIVERSO_LIB before the EVAL_LIB rule
* cmake variable doesn't have a "PATHNAME" type, we used
PATH instead. We also replace FILEPATH with PATH because
BOOST_LIBRARY_DIRS points to a path rather than a file.
Allow to add some extra cxxflags via system env.
Use cases:
1) https://github.com/Microsoft/CNTK/issues/3155
export CXXFLAGS="-Wno-unused-variable -Wno-sign-compare"
configure && make
2) Our libc/libstdc has been built with gcc-4.9. So we fail to build cntk in current state with gcc-5 in compatibility mode.
https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html
After patching cntk we have successful build:
COMMON_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0" CXXFLAGS="-D_GLIBCXX_USE_CXX11_ABI=0 -Wno-unused-variable -Wno-sign-compare" configure && make
(nvcc don't recongnise -W flags)
Thanks to Frank's initial code change.
Initialize NCCL with GPU ID frm nvml call instead of from device id in
/proc. Previous way causes "same device being used" error if two
different containers running on same host and try to initialize NCCL.
Add nvml library for nvml apis
Update the Makefile
In MKL-DNN 0.12 release, there's a cache size detection issue causing crash on AMD Ryzen CPUs.
This change disables MKL-DNN to unblock AMD Ryzen users for now.
This will help to build CNTK for nighlty builds with required CNTK
version. If environment variable 'BUILD_CNTK_VERSION' is set, then CNTK
will be build for that version as a public release('+' won't be appended
to CNTK version). Otherwise hard-coded CNTK version will be used as a
private build (2.4+).
- Accelerates some common tensor ops in Intel CPU inference for float32, especially for fully connected networks
- Can be turned on/off by cntk.cntk_py.enable_cpueval_optimization()/cntk.cntk_py.disable_cpueval_optimization()
CNTK now supports CUDA 9/cuDNN 7. This requires an update to build environment to Ubuntu 16/GCC 5 for Linux, and Visual Studio 2017/VCTools 14.11 for Windows. With CUDA 9, CNTK also added a preview for 16-bit floating point (a.k.a FP16) computation.
Please check out the example of FP16 in ResNet50 at /Examples/Image/Classification/ResNet/Python/TrainResNet_ImageNet_Distributed.py
Notes on FP16 preview:
* FP16 implementation on CPU is not optimized, and it's not supposed to be used in CPU inference directly. User needs to convert the model to 32-bit floating point before running on CPU.
* Loss/Criterion for FP16 training needs to be 32bit for accumulation without overflow, using cast function. Please check the example above.
* Readers do not have FP16 output unless using numpy to feed data, cast from FP32 to FP16 is needed. Please check the example above.
* FP16 gradient aggregation is currently only implemented on GPU using NCCL2. Distributed training with FP16 with MPI is not supported.
* FP16 math is a subset of current FP32 implementation. Some model may get Feature Not Implemented exception using FP16.
* FP16 is currently not supported in BrainScript. Please use Python for FP16.
To setup build and runtime environment on Windows:
* Install [Visual Studio 2017](https://www.visualstudio.com/downloads/) with following workloads and components. From command line (use Community version installer as example):
vs_community.exe --add Microsoft.VisualStudio.Workload.NativeDesktop --add Microsoft.VisualStudio.Workload.ManagedDesktop --add Microsoft.VisualStudio.Workload.Universal --add Microsoft.Component.PythonTools --add Microsoft.VisualStudio.Component.VC.Tools.14.11
* Install [NVidia CUDA 9](https://developer.nvidia.com/cuda-90-download-archive?target_os=Windows&target_arch=x86_64)
* From PowerShell, run:
/Tools/devInstall/Windows/DevInstall.ps1
* Start VCTools 14.11 command line, run:
cmd /k "%VS2017INSTALLDIR%\VC\Auxiliary\Build\vcvarsall.bat" x64 --vcvars_ver=14.11
* Open /CNTK.sln from the VCTools 14.11 command line. Note that starting CNTK.sln other than VCTools 14.11 command line, would causes CUDA 9 [build error](https://developercommunity.visualstudio.com/content/problem/163758/vs-2017-155-doesnt-support-cuda-9.html).
To setup build and runtime environment on Linux using docker, please build Unbuntu 16.04 docker image using Dockerfiles /Tools/docker. For other Linux systems, please refer to the Dockerfiles to setup dependent libraries for CNTK.