* Make python BuildConfig serializable/deserializable to/from string
* Make C++ BuildConfig serializable/deserializable to/from string
* Revert "Make python BuildConfig serializable/deserializable to/from string"
This reverts commit a5e1fb3ff63a161cc0d63475d2a32816cc4c3666.
* Revert "Make C++ BuildConfig serializable/deserializable to/from string"
This reverts commit ec0c2c54543050fe6f264d06eebff33dee70370b.
* Converted BuildConfig to use TVM node system
* Fix lint
* Fix lint
* Added code to set node attributes through the C API
* Fixed bug in build_config()
* Fix lint
* Fix lint
* Fix test errors
* Reduced scope of node __setattr__ to apply only to BuildConfig
* Fix lint
* Fix lint
* Changed python BuildConfig to be immutable, with values set once on construction.
* Fix lint
* Fix C++ test
* Fixed BuildConfig setting python-side args
* Fix lint
* Removed dependency on reflection.cc to construct BuildConfig (allow use in runtime library)
* Fix lint
* Revert "Fix lint"
This reverts commit 16ed6d7a1ca5e551b035bad46e8361ea487cd45b.
* Revert "Removed dependency on reflection.cc to construct BuildConfig (allow use in runtime library)"
This reverts commit 43817c97a2ee045791e0c031d962fa97636ce8f6.
* Avoid accessing BuildConfig when using runtime lib
* Fix missing import
* Fix error running under cython (root cause: node handle is not valid until after __init__ has returned, so cannot call __dir__ during __init__
* Fix error where BuildConfig._node_defaults was not copied in build_config()
* Fix lint
* Fix lint
* Fix lint
* Fix lint
* Add comments to python BuildConfig
* Ported injective schedules to C++. Added some elementwise ops.
* Fix lint errors
* Added reduction ops and schedules
* Fix lint errors
* Fix lint errors
* Fix lint errors
* Added transform ops
* Fix lint errors
* Fix lint errors
* Added softmax, log_softmax, leaky_relu and flatten ops.
Fixed issue where TVM_DECLARE_INTRIN_UNARY used the PureExtern flag
instead of PureIntrinsic.
Added softmax CUDA schedule.
* Fix lint
* Fix lint
* Added binary_dense, batch_norm_inference, dense, dilate, scale_shift_*,
global_pool and pool ops.
Extended pad to allow specifying pad_value.
Fixed issue where pad would throw if padding was zero in all dimensions.
* Fix lint
* Fix lint
* Added CUDA schedules for dense, pool and global_pool
* Added extern schedules for generic and CUDA
* Fix lint
* Added x86 binary schedules
* Fix lint
* Added rocm dense schedule. Added rocBLAS and cuBLAS support to dense ops
* Added pow ops. Added x86 default and injective schedules
* Fix lint
* Fix lint
* Fix lint
* Fix lint
* Fix lint
* Fix indent
* Removed schedules directory
* Changed left_shift, right_shift to operators. Changed pad_value in pad() to remove pointer usage
* Fixed usage of pad in nn/pooling.h. Fixed declaration of operator>>
* Fixed comments for shift operators
* Added comments to utility functions
* Added TOPI C++ library, exporting broadcast_add op
* Fix lint
* Share libinfo.py with TVM
* Fix lint
* Add other broadcast ops
* Fix lint
* Fix imports in topi
* Fix lib names
* Fixed build issue where windows builds don't apply correct definitions
* Removed TVM_EXPORTS from topi library
* Attempted CI build fix
* Add topi lib to tvm_multilib
* Fix Jenkinsfile
* Added TOPI build target to Makefile
* Fix nn op namespaces.
* Fix lint
* Renamed TOPI lib to libtvm_topi
* Removed _ffi/base.py
* Remove _ffi from topi, now shared with tvm.
* Make libtvm_topi loading optional
* Fix compiler warnings
* Fix lint
* Fix lint
* Fix lint
* Fix build error by making new libs argument to Target optional
* Added C++ Target type interop. Added registration of remaining C++ ops and schedules. Added test of broadcast ops
* Fix lint
* Fix lint
* Fix compile error
* Fix compiler warnings
* Fix compiler warnings
* Fixed int vector interop. Fixed argmin incorrectly invoking argmax. Fixed corner case in default schedules of attempting to fuse 0 length axes. Added tests for reduce ops.
* Refactored reduce builders
* Fixed typos in topi.cc. Added basic test.
* Fixed padding size error. Added dense, dilate, pooling tests
* Fixed issue where clip would output a different dtype to the input. Added split_sections op to cover the other mode of the python split op. Added tests.
* Changed extension type numbers to avoid clash with NNVM
* Fix lint
* Fix compiler warnings
* Removed use of std::vector from the public TOPI API
* Fix lint
* Add TOPI C++ tests to CI
* Fixed detail namespacing. Improved comments.
* when there is no intrin func, using body for initialization. For issue 714.
* Refine code per review comments, and add a test case.
* Fix lint issues.
* Re-organize the tensorize test cases, and add a new case for none-reset
mode.
* Fix a typo.
* Delete the unit case because merged it into test_schedule_tensorize.py already.
* always use new tensor in its stage when rewrite for cache read
* revert previous changes to sync up with master
* support using the ptr with an original offset
* update test case and fix CI error
* [SCHEDULE]enable partition const loop with build flag (#719)
* enable partition loop with build flag
* add a testcase, and modify LoopPartition related cases
* * add document for split_const_loop
* [IRbuild]Support automatically Name Loop Variable in IRBuilder (#719)
* add idx_num in class
* using typical index [i, j, k] first, then i_suffix
* keep inputs names
* fix lint
* improve comment of name
* fix lint
* [SCHEDULE]enable partition const loop with build flag (#719)
* enable partition loop with build flag
* add a testcase, and modify LoopPartition related cases
* * add document for split_const_loop
* added math function support
* bug fix extern func call in llvm based codegen
lint fix
fix build
bug fix extern func call in llvm based codegen
* moved rocm bitcodes detection to python
use `object.__eq__`(default object identity comparison) as default
implementation of same_as. This should be OK since `EqualOp` and
`NotEqualOp` are pure Python object, `object.__eq__` is sufficient.
* Add same_as to NodeBase
1. Most class inherited from NodeBase(Schedule, Stage, etc) still have
the convenience of using '==' for object identity. And this is the right
behavior for non-Expr classes.
2. subclasses of ExprOp now create EQ expression when '==' is used.
`__nonzero__` and `__bool__` in EQ and NE is a comprise that in some cases
object identity semantics is still useful, like in unit test. For instance:
````
assert a == b
````
"a == b" will create EQ expression, assert then calls `__nonzero__` of the
result expression. `Expr.__nonzero__` throws exception since it prohibits
evaluating IR expression.
More complex case like:
````
assert a in b # b is dict
````
it will call `__eq__` on a and all keys of b, then `__bool__` on the result
expression. This could not easily be done by same_as.
* Retain __hash__ from NodeBase in Python3
* added initial llvm codegen for amdgpu
* fixed whitespace
* fixed hsaco gen from ir
* fixed targetmachine for rocm and added GetSource for rocm
* fixed whitespace issues
* changed statement to use less than 100 lines
* added intrinsics for workgroup - rocm
* whitespace - newline error fix
* fixed error msg for workitem-workgroup intrinsics
* added llvm ir dump for rocm codegen
* [ROCM] changed codegen to emit proper amdgpu kernel header
* fixed whitespace error
* fixed whitespace error- 2
* fixed AddFunction to not to use extra arg
1. Changed AddFunctionInternal to not to take extra arg for target type
2. Use Target from CodeGenLLVM to check for AMDGPU target
* fixed whitespaces
* fixed whitespaces 2
* fixed codegen for AMDGPU - now generating valid IR
* fixed codegen depending on code review
* reviewed alignment for amd devices
* added code to dump code object to file
* fixed cpplint errors
* print out IR after pass manager
* added code to dump asm, obj to file and std string
* fixed whitespaces
* Update codegen_amdgpu.cc
* used registry for amdgpu llvm
* Fixed whitespaces
* added code for calling linker
* fixed formatting errors
* added rocm link python interface
* fixed pylint issues and added more body to the function
* added doc string
* added doc string for module
* fixed python code after review, fixed llvm object codegen
* fixed linker to generate code object
* removed dumping to output file and debugging log out
* fixed lint for python code
* added fault check after running linker
* removed print statement in rocm.py
* changed rocm lld linker to raise runtimeerror than emitting error log to stderr
* changed the way linker command line is pass to subprocess.popen
* removed redundant code and reuse tvm utils
* removed commented out code
* removed cloning of unused modules, and put IR into string