* added initial llvm codegen for amdgpu
* fixed whitespace
* fixed hsaco gen from ir
* fixed targetmachine for rocm and added GetSource for rocm
* fixed whitespace issues
* changed statement to use less than 100 lines
* added intrinsics for workgroup - rocm
* whitespace - newline error fix
* fixed error msg for workitem-workgroup intrinsics
* added llvm ir dump for rocm codegen
* [ROCM] changed codegen to emit proper amdgpu kernel header
* fixed whitespace error
* fixed whitespace error- 2
* fixed AddFunction to not to use extra arg
1. Changed AddFunctionInternal to not to take extra arg for target type
2. Use Target from CodeGenLLVM to check for AMDGPU target
* fixed whitespaces
* fixed whitespaces 2
* fixed codegen for AMDGPU - now generating valid IR
* fixed codegen depending on code review
* reviewed alignment for amd devices
* added code to dump code object to file
* fixed cpplint errors
* print out IR after pass manager
* added code to dump asm, obj to file and std string
* fixed whitespaces
* Update codegen_amdgpu.cc
* used registry for amdgpu llvm
* Fixed whitespaces
* added code for calling linker
* fixed formatting errors
* added rocm link python interface
* fixed pylint issues and added more body to the function
* added doc string
* added doc string for module
* fixed python code after review, fixed llvm object codegen
* fixed linker to generate code object
* removed dumping to output file and debugging log out
* fixed lint for python code
* added fault check after running linker
* removed print statement in rocm.py
* changed rocm lld linker to raise runtimeerror than emitting error log to stderr
* changed the way linker command line is pass to subprocess.popen
* removed redundant code and reuse tvm utils
* removed commented out code
* removed cloning of unused modules, and put IR into string
* rename the nchw and pass the unit test; going to do it for nhwc depthwise
* bug with fusion
* nchw works fine; nhwc float32 problem remains
* still cannot bind them together
* fusion works
* syntax fix
* all bugs fixed; test cases pass
* minor fix on nn.h
* back wrt input
* backward wrt input nhwc; only test case in recipe
* test case for depthwise back wrt input
* test case for depthwise backward wrt weight
* tags
* minor fixes
* pylint test; add arch=3.7
* modify scheduler
* better backward depthwise w.r.t weight scheduler
* updated scheduler
* test_topi_depthwise_conv2d_back_input.py and test_topi_depthwise_conv2d_back_weight.py success
* all test cases wrt input pass
* update
* new test cases and scheduler
* not working 1 and 2
* good wrt weight, bad wrt input
* test cases added
* remove tf lines
* minor fix
* compute arch changed
* remove compile hook
* minor change
* pylint
* fix the float for python case
* fix cases for python3 case
* except for memoize
* fix most; memoize still wrong
* memoize added
* unexpected layout cases added for scheduler
* error message layout other than NHWC added
* improve padding
* fix as pr requests
* remove dilate in backward wrt weight
* [DOCS] Add prerequisites about zlib1g-devin
Add prerequisites about zlib1g-dev. It occurs `/usr/bin/ld: cannot find -lz` without zlib1g-dev.
* Add prerequisites about python-setuptools
Add prerequisites about python-setuptools. Otherwise, it will fail when executing `python setup install --user` command.
* [DOCS] Add prerequisites about python-dev
Add installation prerequisites about python-dev. Otherwise, it will fail with `SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel.` when executing `python setup install --user`.