Thilo Will
e7c884a047
removing some uneeded (int) casts.
2016-08-26 11:45:05 +02:00
Thilo Will
3f5f0028aa
Fixing issue that for Sparse * Dense the dense*Dense path was taken afterwards
2016-08-26 11:14:33 +02:00
Mark Hillebrand
54b07705b7
Merge remote-tracking branch 'origin/master' into mahilleb/cuDNN5
2016-08-26 10:59:09 +02:00
Thilo Will
ad17a78209
Fixing some newly introduced bugs
2016-08-26 10:19:32 +02:00
yuxiaoguo
be79c3ca64
add matrix pool, add basic impl
2016-08-26 14:28:58 +08:00
Nikos Karampatziakis
65dbb7a7e5
Merge branch 'DanielMerget-DanielMerget/fix_atomicAdd' into nikosk/pascal-and-cuda8-fixes
2016-08-25 16:01:23 -07:00
Nikos Karampatziakis
002e920bc5
Update CNTK.Cpp.props to CUDA8;
...
Add atomicAdd fix by DanielMerget
2016-08-25 15:57:17 -07:00
Nikos Karampatziakis
58b7186777
Merge branch 'DanielMerget/fix_atomicAdd' of https://github.com/DanielMerget/CNTK into DanielMerget-DanielMerget/fix_atomicAdd
2016-08-25 14:29:09 -07:00
Thilo Will
3786b88683
Added path CPU: SPARSE * DENSE -> DENSE to MultiplyAndWeightedAdd in Matrix.cpp
2016-08-25 17:52:27 +02:00
Thilo Will
5658594c69
CPU Sparse*Dense->Dense compiles
2016-08-25 17:39:26 +02:00
Thilo Will
1810e7cafe
First implementation of new sparse*dense for CPU
2016-08-25 15:56:13 +02:00
Mark Hillebrand
d9e9c885bd
Merge remote-tracking branch 'origin/fseide/cudnn5' into mahilleb/CuDnn5Test
2016-08-25 15:44:53 +02:00
Mark Hillebrand
6827182791
Merge remote-tracking branch 'origin/mahilleb/CuDnn5Test' into mahilleb/CuDnn5Test
2016-08-25 15:38:08 +02:00
Mark Hillebrand
fc3a071a71
CntkBatchNormalization.cuh: fix for batchSize == 1
2016-08-25 15:37:23 +02:00
Thilo Will
f0aa69d365
merge from master
2016-08-25 11:31:09 +02:00
Frank Seide
e576c3d6b7
missing NoGPU.cpp entries for RNN node;
...
fixed shared_ptr to incomplete CuDnnRNNExecutor
2016-08-24 20:24:31 -07:00
Frank Seide
686078fdfd
(made gcc happy)
2016-08-24 19:51:02 -07:00
Frank Seide
5f14fcaea0
merged from mahilleb/CuDnn5Test
2016-08-24 18:45:39 -07:00
Frank Seide
769b2602a2
updated SLUHandsOn tests
2016-08-24 18:15:09 -07:00
Frank Seide
d9c7e82031
OptimizedRNNStackNode: renamed some variables, renamed recurrentOps to camelCase, added weigth inference
2016-08-24 17:17:23 -07:00
Frank Seide
8a86da8f02
renamed RNNNode to OptimizedRNNStackNode, also updated parameter names
2016-08-24 16:10:01 -07:00
Mark Hillebrand
0285fa9a13
Merge remote-tracking branch 'origin/master' into mahilleb/CuDnn5Test
...
Conflicts:
Source/ComputationNetworkLib/ComputationNode.h
Source/ComputationNetworkLib/TrainingNodes.h
Tests/EndToEndTests/Examples/Image/Miscellaneous/CIFAR-10/02_BatchNormConv/baseline.linux.txt
Tests/EndToEndTests/Examples/Image/Miscellaneous/CIFAR-10/02_BatchNormConv/baseline.windows.txt
Tests/UnitTests/MathTests/ConvolutionEngineTests.cpp
2016-08-25 00:37:49 +02:00
Frank Seide
d8c3c15be5
merged from CuDnn5Test and cudnn-rnn
2016-08-24 09:44:29 -07:00
Mark Hillebrand
493744d922
Source/Math/CntkBatchNormalization.cuh: fix variance conversion
2016-08-24 14:00:22 +02:00
Frank Seide
d1b1127c9d
Merge branch 'jdroppo/cudnn-rnn-lstm' of https://github.com/Microsoft/cntk into fseide/cudnn5
2016-08-24 00:20:13 -07:00
Mark Hillebrand
bb155ef563
tune
2016-08-24 00:06:49 +02:00
Mark Hillebrand
e1a9cabbde
Address CR comments
2016-08-23 20:32:03 +02:00
Jasha Droppo
373adfd9ac
Changes Addressing Code Review for CUDNN RNNStack Node
2016-08-23 10:40:34 -07:00
Thilo Will
f8f663551b
formatting code
2016-08-23 17:02:12 +02:00
Thilo Will
2de2ffb10b
In backprob of times Dense Sparse no switching to sparse
2016-08-23 16:45:45 +02:00
thilow
8c2ec53cde
Adding comments in method MultiplyAndWeightedAdd of Matrix.cpp
2016-08-23 06:19:22 -07:00
Mark Hillebrand
66498cf414
Merge remote-tracking branch 'origin/master' into mahilleb/CuDnn5Test
...
Note: baselines need to be fixed for
Tests/EndToEndTests/BatchNormalization and
Tests/EndToEndTests/Examples/Image/Miscellaneous/CIFAR-10/02_BatchNormConv.
2016-08-23 11:12:35 +02:00
Frank Seide
1f9c539c61
Merge branch 'jdroppo/cudnn-rnn-lstm' of https://github.com/Microsoft/cntk into fseide/cudnn5
2016-08-22 20:11:07 -07:00
Frank Seide
54f096083d
merged from mahilleb/CuDnn5Test
2016-08-22 18:51:22 -07:00
Frank Seide
5b969bac70
merged from master. Undid the ClassificationError baseline updates due to merge conflicts
2016-08-22 14:36:28 -07:00
Jasha Droppo
2fa1b7033d
Merge commit 'origin/master' 8493f11
into jdroppo/cudnn-rnn-lstm
2016-08-22 13:28:42 -07:00
Amit Agarwal
37b6897e94
Merge branch 'master' of https://github.com/Microsoft/CNTK into amitaga/cntkv2Library
2016-08-22 10:48:54 -07:00
Mark Hillebrand
f76afa2b7e
Switch to CuDNN v5
...
For batch normalization, running inverse standard deviation becomes
running variance. We mirror this CuDNN v5 change in the CNTK batch
normalization engine. Model version is bumped. When old models are
loaded, this parameter is (approximately) converted.
In the same model version change, let batch normalization count
samples seen rather minibatches (this deals with incorrect averaging
when minibatch size is varied across epochs).
For batch normalization averaging and blending handle initialization
cases, don't rely on mean and variance initial values (set in
NDL/BrainScript).
Update Windows / Linux / Docker build.
With this commit, CuDNN v4 is not supported anymore.
2016-08-22 17:55:10 +02:00
Amit Agarwal
fa4b99d102
CNTK v2 Library: a) Add dynamic axis support b) New primitive functions and some higher level functions and c) Sequence classification test
2016-08-21 03:49:03 -07:00
Frank Seide
050a84035f
tuned >1-bit SGD: odd #quantization levels, range now 4 stddevs (before: 5)
2016-08-21 01:33:01 -07:00
Frank Seide
db74d6b468
changed ImageHandsOn from "gaussian" to "heNormal" initialization, and also most layers defaults in CNTK.core.bs
2016-08-19 23:34:17 -07:00
Daniel Merget
6eaacc7a98
clarified comment
2016-08-19 14:15:51 +02:00
Daniel Merget
fc2e6c2427
avoid double definition of atomicAdd on modern GPUs
2016-08-19 13:47:39 +02:00
Zhou Wang
3d80725a16
Define THREAD_LOCAL and force currentDevice to be THREAD_LOCAL
2016-08-19 11:16:33 +02:00
Jasha Droppo
971b7d0003
Change PrepareDevice() to have a Thread Dependent Value Cache
...
The cached value of currentDevice is meant to avoid redundant calls
to cudaSetDevice(). But, this setting is tread specific. So, the
cache should be thread specific. This fixes the problem on Windows.
2016-08-18 15:41:59 -07:00
Wolfgang Manousek
786ec99da2
fixed broken ifdef statement
2016-08-17 10:04:56 +02:00
Wolfgang Manousek
79cfcf7d4f
more acml removal
2016-08-17 10:04:56 +02:00
Jasha Droppo
b99b3832fb
CuDNN-RNN Fix Merge Error in Math/*filters Visual Studio files
2016-08-16 15:50:56 -07:00
Jasha Droppo
2fb185b1fe
CUDNN-RNN Fix Parameter Count in Error Message
2016-08-16 10:10:06 -07:00
Jasha Droppo
80d077054d
Merge branch 'master' into jdroppo/cudnn-rnn-lstm
2016-08-15 16:11:16 -07:00
Frank Seide
f5e77e4efb
minor fixes
2016-08-13 12:39:27 -07:00
Project Philly
a269e0e6b5
Integrate thilow/FixTimes4SparseOnCPU into master
2016-08-12 13:45:12 -07:00
Thilo Will
8d7ed085e1
Fix of fix
2016-08-12 16:24:01 +02:00
Thilo Will
6f7505f656
Fixing another bug in Times(Dense,Sparse) and restructure code
2016-08-12 14:52:27 +02:00
Eldar Akchurin
66e45348fa
Fixing bug in sparse matrix buffer estimation
2016-08-12 14:47:30 +02:00
Thilo Will
edd3a948f6
Fixing initialsation of result in Times(dense, sparse) on CPU
2016-08-12 09:53:09 +02:00
Frank Seide
d6fb3786ae
bug fix: CPUSparseMatrix<ElemType>::MultiplyAndWeightedAdd() should handle transposed inputs in all combinations
2016-08-07 21:27:54 -07:00
Frank Seide
db5fff2a02
merged from master
2016-08-05 14:11:38 -07:00
Thilo Will
4e17fd5175
Fixing typo in reduction test and reformatting
2016-08-05 14:40:51 +02:00
Thilo Will
253e65b432
ReduceLogSum: beautifications
2016-08-05 14:08:54 +02:00
Thilo Will
69470799c1
merged from master
2016-08-05 12:33:39 +02:00
Thilo Will
0d3b9e57f6
Added comment regarding default axis values in python bindings
2016-08-03 10:59:12 +02:00
Frank Seide
e80562feda
trying a fix to lazy init
2016-08-02 19:24:42 -07:00
Frank Seide
bc06c3c4be
CNTK BatchNorm engine Backprop() should honor blendFactor
2016-08-02 19:13:30 -07:00
Project Philly
e2e15e0b18
Integrate mahilleb/AssertRemoval into master
2016-08-02 06:53:24 -07:00
Mark A. Hillebrand
fa7befb882
Address CR comment
2016-08-02 15:48:54 +02:00
thilow
5a7e77b4c5
ElementwiseProductWithExpOffDiff
2016-08-02 00:32:57 -07:00
Vadim Mazalov
c81eb6fd3f
Add Array struct, quantizer unit tests, minor fixes.
2016-08-01 12:33:03 -07:00
Vadim Mazalov
7cfc3f358e
Remove LearnableParameterQuantized and MEL command to quantize a node
2016-08-01 12:33:03 -07:00
Vadim Mazalov
90079c6fa3
Introduce Matrix<short>.
2016-08-01 12:28:11 -07:00
Vadim Mazalov
15e9cf8e94
Refine MEL command for quantization of LearnableParameter node, changes to InputAndParamNodes.
2016-08-01 12:28:11 -07:00
Vadim Mazalov
77ff661930
Quantization of learnable parameter node
2016-08-01 12:28:11 -07:00
Mark A. Hillebrand
97c0c98b39
Remove assertion that's not true for the CuDNN engine.
2016-08-01 18:46:09 +02:00
Frank Seide
0c86e36310
bug fix: BatchNormEngine::Forward() should assert saveMean/InvStdDev as a post-condition now
2016-07-29 12:42:23 -07:00
thilow
39c60b5c12
ReduceLogSum: adapted core.bs. Tests still failing
2016-07-28 23:20:38 +02:00
Thilo Will
7397854908
ReduceLogSum backward path and core.bs
2016-07-28 17:49:52 +02:00
Thilo Will
38e4b2b402
merged from master
2016-07-28 14:23:59 +02:00
Frank Seide
540cd0be04
addressed CR feedback
2016-07-27 15:11:57 -07:00
thilow
dde483fee7
Adding ReduceLogSum
2016-07-27 22:07:26 +02:00
Frank Seide
ee23bb2000
(fix for previous fix)
2016-07-26 18:03:52 -07:00
Frank Seide
8f716986ae
renamed reduction kernels that expect a hard-coded number of threads to reflect that number in their names
2016-07-26 17:48:37 -07:00
Frank Seide
9dbf806c39
merged from master
2016-07-26 16:51:19 -07:00
Frank Seide
1a80a6a1c1
undid accidental change of Shuffle()
2016-07-26 13:55:12 -07:00
Frank Seide
5e357fee8b
addressed minor feedback from Amit's CR;
...
addressed feedback from Simon Layton (NVidia) that the constants defined in GridDim are too small.
2016-07-26 13:52:21 -07:00
Thilo Will
dcc7e9b3f1
Added comments
2016-07-26 09:40:12 +02:00
Thilo Will
524c5278c7
Using double as aggregator, hoping to fix issue with tests TWRGE TWRGS, TLRGS
2016-07-25 17:56:15 +02:00
Thilo Will
52dda16053
Removed 'typedef' in partial specialisation of TensorOpReduction in hope to fix Linux build.
2016-07-25 16:06:03 +02:00
Thilo Will
24c0ad1cf5
Fixed comments
2016-07-25 15:35:14 +02:00
Thilo Will
2115db661e
Renamed variable to reductionOp
2016-07-25 15:17:13 +02:00
Thilo Will
5dbb7254fa
factored aggregation op into a lambda
2016-07-25 15:00:11 +02:00
U-FAREAST\fseide
a5d15f3078
Merge branch 'master' of https://github.com/Microsoft/CNTK into fseide/clonebs
2016-07-22 10:04:08 -07:00
Frank Seide
27ff6f7177
(typo)
2016-07-22 08:53:31 -07:00
Frank Seide
07a6fa25f9
BatchNorm: moved allocation of saveMean() to where they are produced, and allocating them empty when they are not produced at all
2016-07-22 08:46:02 -07:00
Frank Seide
02700105a6
added new interface IFreezable to tell a node to freeze itself, in order to allow BatchNormalization to honor CloneFunction (..., parameters="constant")
2016-07-22 08:24:56 -07:00
Thilo Will
fd954772ea
Converted AggregationOp to pure function template
2016-07-22 17:02:33 +02:00
Thilo Will
bd776dc849
Changed "NeutralValue" to function templates
2016-07-22 16:49:54 +02:00
Thilo Will
80fdb8f53d
using function overloading for neutral
2016-07-22 16:28:53 +02:00
Thilo Will
fec05bffe8
Improved formatting and comments
2016-07-22 15:51:15 +02:00
Frank Seide
ce350dda68
(trying around with saveMean)
2016-07-21 19:47:45 -07:00
Frank Seide
3d70ff34e0
heavily commented batch-normalization code, including several bugs;
...
new interface IParameterNode for identifying LearnableParameters;
first implementation of CloneFunctionConfigLambda (except for returning the result)
2016-07-21 17:37:44 -07:00
Thilo Will
6dce931c19
merged from master
2016-07-21 10:28:22 +02:00
Amit Agarwal
f3dec438d6
a) Made CUDA sync mode execution of kernels a runtime config option instead of a build flavor b) Added perf instumentation to show accurate per MB read, compute and parameter update time
2016-07-20 17:19:00 -07:00
Frank Seide
39a9175097
merged from master
2016-07-19 16:40:51 -07:00
Jasha Droppo
3c8e63f1d5
Fix Bug Introduced in Merge
2016-07-18 11:57:31 -07:00
Jasha Droppo
a4e42744c2
Merge branch 'master' into jdroppo/cudnn-rnn-lstm
...
Conflicts:
Makefile
Source/CNTK/BrainScript/CNTKCoreLib/CNTK.core.bs
Source/Math/CuDnnBatchNormalization.cu
Source/Math/CuDnnConvolutionEngine.cu
Source/Math/Math.vcxproj
Source/SGDLib/SGD.cpp
2016-07-18 11:11:58 -07:00
Ivan Rodriguez
be64a3958d
Using again shared_ptr
2016-07-18 13:43:37 +02:00
Ivan Rodriguez
7d8657b1a8
Change code according to review
2016-07-18 13:43:37 +02:00
Ivan Rodriguez
dac4aca396
refactor tensor tests
2016-07-18 13:40:15 +02:00
Frank Seide
e3b1b66aba
added tensor test(s) to MathPerformanceTests
2016-07-18 13:32:14 +02:00
Ivan Rodriguez
64ecb3c659
Change code according to review
2016-07-18 13:30:00 +02:00
Ivan Rodriguez
64f6978ad9
remove unused forward declared struct
2016-07-18 13:30:00 +02:00
Ivan Rodriguez
934dd082a0
Fix crash when running BiasGradient test. Remove the original test code.
2016-07-18 13:28:11 +02:00
Ivan Rodriguez
936b736c1f
refactor tensor tests
2016-07-18 13:28:11 +02:00
Ivan Rodriguez
af6ddf9c04
Move math performance tests to MathTests
2016-07-18 13:28:11 +02:00
Frank Seide
a172c89111
added tensor test(s) to MathPerformanceTests
2016-07-18 13:24:56 +02:00
Zhou Wang
e3927bb717
Add math unit tests and adapt them for Linux
...
This is a combination of 7 commits.
minor format changes
adapt makefile and math tests
enable sse4.1 support
adapt to linux
fix shadow param, and adjust order of functions
netowrk tests need .cu
move constant definition into a .cpp file, instead of .h
2016-07-13 16:03:06 +02:00
Thilo Will
873d988115
Improved formatting and comments
2016-07-13 11:50:18 +02:00
Thilo Will
4bcc0d1b85
Trying to avoid: template instantiation depth exceeds maximum
2016-07-12 17:47:16 +02:00
Thilo Will
9c0cf7123a
Factored out the reduction operations
2016-07-12 16:20:58 +02:00
Thilo Will
248753faa6
Factored out neutral element of binary ops.
2016-07-12 14:32:47 +02:00
Thilo Will
204aca8563
ReduceOp passed through on all paths
2016-07-12 10:22:32 +02:00
Thilo Will
2cc578ef0e
passed through reduceOp till end on some paths. In TensorOpReduce still missing
2016-07-12 10:01:22 +02:00
Thilo Will
8493e614e2
passing reductionop to _launchTensorOpWithReduction
2016-07-11 17:05:04 +02:00
Thilo Will
beb797b0db
implemented min/max reduction inside TensorOpElement
2016-07-11 16:56:32 +02:00
Thilo Will
98f9e8ac39
Passing reduction op furhter down
2016-07-11 15:49:13 +02:00
Thilo Will
73d1e32d3a
Revert "passing through reductionOp. Not yet compiling."
...
This reverts commit 84c9b0caa0
.
2016-07-11 14:38:29 +02:00
Thilo Will
84c9b0caa0
passing through reductionOp. Not yet compiling.
2016-07-11 11:45:08 +02:00
Thilo Will
f4c3821302
passing the reduction op down the call hierarchy for reduction on GPU into LaunchTensorOpWithReduction. In ReduceElementsNode renaming m_op to m_reductionOP
2016-07-08 16:26:58 +02:00
Thilo Will
94bd96eaba
merge with master
2016-07-08 13:57:01 +02:00
anthonyaue
3f41c0c9c5
Add a bunch of new tests to exercise block multiplier.
...
Change spacing of comments.
Reset omp threads in d'tor.
2016-06-30 08:23:13 -07:00
anthonyaue
03e504e7ab
Allow block multiplier to support arbitrary number of rows in A
2016-06-29 13:19:35 -07:00
Jasha Droppo
e3dd352d20
RNN Debug info on object creation/deletion
2016-06-29 11:28:53 -07:00
Thilo Will
693b9a6c45
merged with master
2016-06-27 11:51:20 +02:00
Project Philly
d39410d2fc
Integrate anthaue/addblockmultiplier into master
2016-06-24 16:50:41 -07:00
anthonyaue
d2bf769c83
Implement code review feedback from clemensm
2016-06-24 08:40:48 -07:00
Thilo Will
681c805cf9
Added some code that should force ReduceMax to run on CPU
2016-06-24 17:28:52 +02:00
Thilo Will
723441ea5c
ReduceMin/Max work (cpu only). Missing: move to cpu.
2016-06-24 16:20:37 +02:00
Thilo Will
20c9444176
ReduceMax/Min builds
2016-06-24 15:40:25 +02:00
anthonyaue
a2fda5f7f5
Put AVX support under SUPPORT_AVX2 so compilatino on linux will work
2016-06-21 13:45:37 -07:00
anthonyaue
1e6fb55338
Add Stdafx.h headers to fix release win build breaks; add -mavx2 flag to
...
fix linux build break
2016-06-21 10:43:29 -07:00
anthonyaue
96e40865b8
Fix some capitalization issues
2016-06-21 08:45:07 -07:00
Eldar Akchurin
3bf21c13e3
Fixing invalid address
2016-06-21 17:38:54 +02:00
Eldar Akchurin
3e61e9fa84
Removing unused dependencies
2016-06-21 17:38:54 +02:00
Eldar Akchurin
39a44a58de
Inlining checks
2016-06-21 17:38:54 +02:00
Eldar Akchurin
d564ed7cab
Adding cude runtime api
2016-06-21 17:38:54 +02:00
Eldar Akchurin
5b0a3aa55a
Fixing comments
2016-06-21 17:38:54 +02:00
Eldar Akchurin
f75a32301e
Fixing Cuda return code checks
2016-06-21 17:38:54 +02:00
Frank Seide
4a79ef3d8c
added tensor test(s) to MathPerformanceTests
2016-06-17 12:10:37 -07:00
Anthony Aue
12140c993b
Makefile and *.vcxproj changes
2016-06-17 08:57:55 -07:00
Anthony Aue
567a7ff421
Implement comments from code review. Have not tried to compile.
2016-06-16 16:17:21 -07:00