Граф коммитов

1547 Коммитов

Автор SHA1 Сообщение Дата
liqfu e940605f6b Support ONNX Scan op 2018-10-19 21:36:21 -07:00
Bowen Bao a55e871ec8 Fix InvStdDev.
* Issue was that AssignSqrOfDifferenceOf(beta, input, mean, alpha)
assigns mean value to the gaps in input. These values are then reduced
within this function, leading to incorrect results. The fix is to
execute assign and reduce separately, and mask gaps to zero again before reducing.
* Update test baseline affected by this change (err is lowered by <1%).
2018-10-19 10:24:11 -07:00
KeDengMS 1489de8de8 Fix a crash in transpose_times simplification to element times 2018-09-21 22:33:41 -07:00
Bowen Bao da6b0bc71f GatherNode backward: add check for no dynamic axis
Previously, to resolve issue of gather producing incorrect gradient
values, validity mask check was added to ensure we don't count non-valid
cells as 0.
However, this check is needed only for input that has dynamic axis, i.e.
inputs that have MBLayout.
2018-09-20 14:54:39 -07:00
Yang Chen 3d809bf54c Added several internal API header files
In case other projects may use these header files, we added
them into API/Internals.

* ComputationGraphAlgorithms.h was moved from Source/ComputationNetworkLib

* PrimitiveOpType.h and EvaluatorWrapper.h were moved from Source/CNTKv2Library

* PrimitiveFunctionAttribute.h was extracted from PrimitiveFunction.h. It contains
  a new class PrimitiveFunctionAttribute which is the collection of all attribute
  names for PrimitiveFunction.

  This change actually had a subtle side-effect. We had a global static variable
  s_stateAttributes that depended on PrimitiveFunction::AttributeNameRngSeed and
  PrimitiveFunction::AttributeNameRngOffset. After we moved those static
  attribute-variables into another translation unit, s_stateAttributes can be
  initialized with empty wstring, because PrimitiveFunctionAttribute::AttributeNameRngSeed
  PrimitiveFunctionAttribute::AttributeNameRngSeedOffset were initialized after
  s_stateAttributes. Note that the initialization order of global static variables
  is not well-defined cross translation units. To fix the issue, we also moved
  s_stateAttributes into PrimitiveFunctionAttribute class, and renamed it to
  s_rngStateAttributes. I think it's reasonable to consider s_rngStateAttributes
  to be part of the PrimitiveFunctionAttribute class.

* PrimitiveFunction.h was moved from Source/CNTKv2Library
2018-08-22 10:47:18 -07:00
Vadim Mazalov 2a748e1ce9 Merge branch 'vadimma/lc-blstm-sq' 2018-08-09 02:28:38 +00:00
Binbin Zhang 1ebdc548f3 Introduce latency-controlled BLSTM 2018-08-08 15:41:51 -07:00
kaituoxu e5006a4866 fix parameter sharing bug 2018-08-06 19:00:33 +09:00
Bowen Bao c78f40d0c1 Avoid asym pad for MKL, block channel axis padding. 2018-07-30 19:50:35 -07:00
Spandan Tiwari d69fd95d04 Adding input validation for group convolution. 2018-07-17 13:57:00 -07:00
Bowen Bao dca867c2cb add warning for convolution padding on channel axis. 2018-07-11 16:03:56 -07:00
Bowen Bao 5ca4bb3a93 Add Sequential Convolution.
adding convolution over sequential axis related tests.

adding convolution over sequential axis.
currently additional supported parameters:
  auto padding
  strides
  groups
support for dilation needs to be tested on GPU.

updating PrimitiveOpType SerializationTests that is missing from other commits ..

convert tabs to spaces.

Refine cpp convolution unit tests. Add dilation tests to python convolution unit tests.

more detailed comments on shape change for 1d seq conv with reduction rank 0. And other minor tweaks.

add EndToEndTests of sequential convolution on MNIST

add init_bias tests for seq conv

minor change in comments

rename ConvolutionOverSequenceAxisNode. Add comment on cudnn failed new test.

add more comments, trim spaces

add more comments, remove magic number, add more boundary checks.

remove the last SetValue for outputSeqAxisDimValue as TensorView Unary Op has already updated the value.

fix bug in python seqconv default bias shape, and add related unit tests.

small tweak in seq conv to avoid additional gpu memory allocation and increase performance.

Example: seq MNIST, and profiling

adjust conv c++ value unit test channel size.

small update on python seq mnist

Sequential convolution v2.
* re-designed ConvolutionSequenceShapeNode: refactored to separate out computing output sequence length from v1 node design. And refactored ConvolutionNodeBaseExtended as their common base class. (Since "ConvolutionNodeBase" is not only base class of ConvolutionNode but also PoolingNode).
* Performance increase against v1.
- compute sequence length by MBLayout instead of mask output from unpack. Avoiding the unnecessary cpu/gpu memory copy.

not include py sequence example for now .. need to find they a correct location.

add check for truncated sequences in sequential convolution

improve code style.

Moving sequential convolution in python to a new high level api, to maintain compatibility with previous implementation (special case 1d sequential convolution).

Add ConvolutionSequenceShape OP.

nit

update conv_attribute test for updated convolution parameter
move sequential parameter to the last
update test shortcircuit for CPU convolution dilation.

update endtoendtest - unittest baseline file for new convolution unittests.

update makefile to include new unittest file for linux

nit

Update ConvolutionNode initialization code to handle TransformerNode Initialization.

nit

nit
2018-07-10 21:10:33 -07:00
David Brownell 624bf7d82b String changes to support conversion from ASCII, UCS2, UCS4, UTF8, UTF16, and UTF32 strings 2018-07-06 18:48:27 -07:00
Bowen Bao 6cc772f693 small change for more reuse of code 2018-07-05 16:42:51 -07:00
Bowen Bao 0df5c39fbb Add method ScatterToIndicesWithMask for class Matrix(both CPUMatrix and GPUMatrix).
Change GatherNode Backprop to use the new method instead of altering data for gaps in input matrix.
2018-07-05 14:20:13 -07:00
Bowen Bao 7c74244387 set missing value back to 0 in case of unexpected usage. 2018-07-03 18:48:01 -07:00
Bowen Bao 1e058cedcf fix Gather op's incorrect gradient value.
* the error was due to that we pad 0 as default value for missing gaps. All these then each contribute 1 to the gradient of reference at index 0. The fix is to mask missing values in indices matrix to negative, and in Matrix scatter implementation to check and skip negative indices. (previous Matrix CPU implementation already checks for negative indices)
2018-07-03 18:34:11 -07:00
Sergii Dymchenko e95f92cd5e Add Tan/Atan ops to CNTK (with ONNX support). 2018-06-28 14:03:02 -07:00
Jie Zhu 3c389ed28d fixing a bug in straight through implementation 2018-06-27 11:44:49 -07:00
Jie Zhu cc368aea82 fixing a bug in backwarod propagation of straightthrough 2018-06-27 11:44:49 -07:00
Jie Zhu 02be8a0d69 adding straight through unary op 2018-06-27 11:44:49 -07:00
Jaliya Ekanayake 0d52225685 fix the test failure 2018-05-23 09:56:34 -07:00
Jaliya Ekanayake cce690e8d4 Adding int16 support for model saving in CNTK 2018-05-22 14:52:38 -07:00
Jaliya Ekanayake 796b59dad1 Adding a special op to proxy operands to optimized implemenations such as Halide 2018-05-21 12:39:44 -07:00
Yuqing Tang 367a13d0ee Fix bugs in no-backprop gradients ops and adding unit tests. 2018-05-07 17:55:50 -07:00
Yuqing Tang 5a587b376d Implemented eye_like Op and the depedent SetDiagonalValue methods for CPU and GPU sparse matrices. 2018-05-05 21:24:59 -07:00
Jaliya Ekanayake c0d5386502 Adding int8 support for NDArrayView 2018-05-01 11:40:05 -07:00
KeDengMS 5e0856e47e Fix bugs for fp16 in RNN and BMUF
Note that sparse embedding is still not working yet.
2018-04-11 15:35:08 -07:00
Spandan Tiwari 09e25a47fa Moving group convolution implementation to use cuDNN7 and MKL2017 APIs. 2018-04-11 10:26:15 -07:00
Yuqing Tang 90a41f6a10 Enabled gather op indcies be computed from parameters. 2018-03-27 16:48:11 -07:00
Jaliya Ekanayake 73c2046e88 Adding recurrence support to user defined functions. This enables UDF to be called inside recurrent loops. 2018-03-22 11:44:09 -07:00
Project Philly 91dbed4401 Integrate vadimma/readse into master 2018-03-17 20:32:53 +00:00
KeDengMS f6b3260dc5 Fix issue #3019: RuntimeError: OptimizedRNNStackNode configured for sequence mode, but minibatch only has one time step. 2018-03-08 16:37:09 -08:00
Vadim Mazalov 9af853d131 Ensure we can read model from a different location 2018-03-01 08:10:39 -08:00
Vadim Mazalov 523af503ea Minor refactoring 2018-02-21 11:16:19 -08:00
Vadim Mazalov 1c7810760d Handle malformed lattice 2018-02-21 11:16:19 -08:00
Project Philly e186dddf5b Integrate vadimma/latpar into master 2018-02-15 18:10:16 +00:00
Vadim Mazalov e9715709db Introduce latticeConfigPath to SE node
(cherry picked from commit dcdad16d2f)
2018-02-14 23:33:46 -08:00
KeDengMS 2237dd0988 Add support for FreeDimension in Pooling/Unpooling 2018-02-14 13:27:45 -08:00
Vadim Mazalov 838a4339bc Some refactoring
(cherry picked from commit 9d2cda718f)
2018-02-14 12:53:44 -08:00
Vadim Mazalov 1a1e08c210 Make the lattice deserialization parallel
(cherry picked from commit 49612264ec)
2018-02-14 12:52:56 -08:00
KeDengMS 3660b7a36e Node timing and profile details format in chrome://tracing.
Working example in ./Examples/Image/Classification/MLP/Python/SimpleMNIST.py

Note that node timing would be added to profiler details when profiler is enabled, i.e.

    import cntk as C
    C.debugging.debug.set_node_timing(True)
    C.debugging.start_profiler()
    C.debugging.enable_profiler()
    trainer|evaluator|function executions
    trainer|evaluator|function.print_node_timing()
    C.debugging.stop_profiler()
2018-02-01 21:53:46 -08:00
KeDengMS 3cf3af5df6 CNTK support for CUDA 9
CNTK now supports CUDA 9/cuDNN 7. This requires an update to build environment to Ubuntu 16/GCC 5 for Linux, and Visual Studio 2017/VCTools 14.11 for Windows. With CUDA 9, CNTK also added a preview for 16-bit floating point (a.k.a FP16) computation.

Please check out the example of FP16 in ResNet50 at /Examples/Image/Classification/ResNet/Python/TrainResNet_ImageNet_Distributed.py

Notes on FP16 preview:
* FP16 implementation on CPU is not optimized, and it's not supposed to be used in CPU inference directly. User needs to convert the model to 32-bit floating point before running on CPU.
* Loss/Criterion for FP16 training needs to be 32bit for accumulation without overflow, using cast function. Please check the example above.
* Readers do not have FP16 output unless using numpy to feed data, cast from FP32 to FP16 is needed. Please check the example above.
* FP16 gradient aggregation is currently only implemented on GPU using NCCL2. Distributed training with FP16 with MPI is not supported.
* FP16 math is a subset of current FP32 implementation. Some model may get Feature Not Implemented exception using FP16.
* FP16 is currently not supported in BrainScript. Please use Python for FP16.

To setup build and runtime environment on Windows:
* Install [Visual Studio 2017](https://www.visualstudio.com/downloads/) with following workloads and components. From command line (use Community version installer as example):
    vs_community.exe --add Microsoft.VisualStudio.Workload.NativeDesktop --add Microsoft.VisualStudio.Workload.ManagedDesktop --add Microsoft.VisualStudio.Workload.Universal --add Microsoft.Component.PythonTools --add Microsoft.VisualStudio.Component.VC.Tools.14.11
* Install [NVidia CUDA 9](https://developer.nvidia.com/cuda-90-download-archive?target_os=Windows&target_arch=x86_64)
* From PowerShell, run:
    /Tools/devInstall/Windows/DevInstall.ps1
* Start VCTools 14.11 command line, run:
    cmd /k "%VS2017INSTALLDIR%\VC\Auxiliary\Build\vcvarsall.bat" x64 --vcvars_ver=14.11
* Open /CNTK.sln from the VCTools 14.11 command line. Note that starting CNTK.sln other than VCTools 14.11 command line, would causes CUDA 9 [build error](https://developercommunity.visualstudio.com/content/problem/163758/vs-2017-155-doesnt-support-cuda-9.html).

To setup build and runtime environment on Linux using docker, please build Unbuntu 16.04 docker image using Dockerfiles /Tools/docker. For other Linux systems, please refer to the Dockerfiles to setup dependent libraries for CNTK.
2018-01-22 16:58:56 -08:00
Eldar Akchurin 8066621859 Fixing BPTT for case when the minibatch size changes 2018-01-17 02:20:23 +01:00
Project Philly 10d7130a43 Integrate vadimma/LatticeSerializer into master 2018-01-16 22:39:23 +00:00
KeDengMS 1a81d41ee0 Fix batch matmul test failures 2018-01-14 23:32:12 -08:00
Chengji Yao 15e705da8d add batch matmul 2018-01-12 18:41:34 -08:00
Vadim Mazalov eb8815508e Include the python test and bump up the model version 2018-01-12 11:06:40 -08:00
Vadim Mazalov bf673b87d0 Disable parallel lattice constructoin 2018-01-12 11:03:23 -08:00
Vadim Mazalov 73f2315c49 Ensure the clean up builds 2018-01-12 11:03:22 -08:00