Extend sequence and frame packers to support sparse input.
Create a tightly filled layout and base packing off of it.
Add a number of unit test for sequence packing (both sparse and
dense, no randomization, using CNTK text format for input).
new operation Trace() (TraceNode) as a debugging aid;
WriteMinibatchWithFormatting() can now log by FrameRange;
WriteFormattingOptions now in ComputationNode.h
Renames SynchronousNodeEvaluator to NDLNodeEvaluatorImpl
Merges SynchronousExecutionEngine.h into NDLNetworkBuilder.h
Renames SynchronousExecutionEngine.cpp into NDLNetworkBuilder.cpp
Moves DebugUtil functions (PrintCallStack, GetCallStack) to ExceptionWithCallStack
Refactors the PrintCallStack and GetCallStack functions in ExceptionWithCallBack to re-use the common functionality.
Make gcc happy
* Add 'openblas' as mathlib option in configure. Not added to auto-search so
must be specified using --with-openblas
* configure script searches empty tail so that libraries located at default_path_list
roots (ie /usr/local/ + include/openblas_config.h) are found
* Treat ACML as the odd library out in ifdefs since it doesn't conform to typical
BLAS standard. Other libraries like ATLAS should be able to share
OpenBLAS/MKL variants. Add default USE_ACML define in VS projects to match
* Fix 'max' macro define colliding with C++ std::max once openblas headers are included
Usage Notes:
* For best performance, build OpenBLAS with USE_OPENMP=1. When running CNTK, set
OPENBLAS_NUM_THREADS environment var or set numCPUThreads CNTK config variable to the
physical core count or performance will suffer
* OpenBLAS 2.16 (git HEAD) tested in Linux with GCC 4.8.4 and in Windows with
OpenBLAS 2.15 (pre-built binary release + MingGW 64-bit support dlls)
* For Windows, in Math.vcxproj, replace libacml_mp_dll.lib with libopenblas.dll.a and change
USE_ACML define to USE_OPENBLAS. Change ACML_PATH environment variable to your OpenBLAS path.
Modify openblas_config.h as per https://github.com/xianyi/OpenBLAS/issues/708
* On current generation Intel processors, OpenBLAS measures a little faster than
AMD ACML and slower than Intel MKL on MNIST and other examples