* Support cblas library in dense
* start to add support for generic batch_matmul compute
* Add x86 override for batch_matmul
* Fix linting
* reset file
* Fix typos
* dummy change to re-trigger CI
* [Relay][VM]VM debugger
* Report mean/min/max for op duration
* Typos
* Lint
* Lint
* Lint
* Support build debug VM in CMake
* Lint
* Enable VM debug in unit test
* Disable debug vm test until new docker image is built
* Add device sync code
* Fix qnn unit test
* Disable vm debug by default
* Rename files
* Rename classes
* Fix comment
* Fix comment
* transpose implementation for tflite.py
* add TRANSPOSE to convert_map
* Fix Unexpected keyword argument 'axis' in function call
* add test for transpose oprator
* Add the parameter 'axes' handling
* add test for transpose oprator
* solve conflict within CONTRIBUTORS.md
* Improve the if condition for empty tuple
* Add one unit test to cover empty tuple
* solve conflict within CONTRIBUTORS.md
* QNN quantize and dequantize operators.
* addressing review comments.
* addressing review comments.
* Adding new line at the end of the file.
* Adhering to styling guidelines.
* Adding name to contributors.
* Fixing lint issue.
* Fixing file name.
* Removing unnecessary code.
* Support BatchMatMul with shapes greater than length 3
* Fixes
* Add tests
* Remove dependency on Python3
* Clean up
* Merge with master
* Resolve comments
* [Relay] [Quantization] WIP - Common files for the qauntization work.
* [Relay] [Quantization] WIP - Prototyping requantize op.
* Requantize operator implementation.
Requantize converts one quantized tensor representation to another quantized
representation. The PR has following implementation features
- Requantize operator defined in qnn namespace - relay.qnn.requantize
- Lowering of the requantize to exisiting Relay operators
- Integer fixed point implementation of requantize
- Two rounding modes - FE_UPWARDS (round towards infinity) and
FE_AWAY_FROM_ZERO (std::round behavior)
- Floating point implementation as well, that can act as reference or can be
used for devices when FP32 computation is not used.
- Unit test cases
Relevant Issue - https://github.com/dmlc/tvm/issues/2351
Credit to TFLite and GemmLowp to provide reference implementations.
* Typo and lint fixes.
* Doc fix.
* Uncommenting the lint script (fixing mistake).
* Modifying the unit tests.
* Moving C++ files into src/relay/qnn
* Moving python files to python/tvm/relay/qnn. Some minor fixes.
* Moving the attrs.h inside the include directory.
* Pushing files that I forgot earlier. Changing util location.
* Incorporating comments. API change. Lint fixes.
* Modifying the GetFixedPointMultiplierShift API as per comments.
* Forgot the dialect change.
* Changing rewrite to qnn_lower.
* Renaming Quantize to Qnn for clarity.
* Remove use_int_domain.
* Incorportaing review comments.
* Adding API doc for QNN dialect.
* Move the qnn_lower pass to transform namespace.
* Moving from expr to module. Adding namespace in C++.
* Minor sentence rewrites. Added qnn namespace.
* Added the API doc.
* Chanding default out_dtype to int8. Adding a test with in/out_dtype as uint8.
* Style fixes. Better error messages.
* Adding documentation.
* More documentation fixes.
* Adding out dtype check for requantize.
* Adding corner case for FP32 to fixed point conversion.
* Adding extra line.
* Documentation fix.
* Adding static inline.
* Incorporating jackwish comment. Removed idtype from requantize lowering.
* Removing Quantize/Dequantize code. Restricting Requantize to (u)int8/int32.
* Style fixes.
* Fix the docs.
* Move to Legalize API.
* [Relay] Rewrite pass.
This pass transforms an expression to other expression.
This pass has many usecases
* Replace a expr to another expr, if the other expr has faster performance.
* For ASICs, we might want to modify the inputs to adapt to the HW support.
* Alter op layout can work in conjunction with this pass.
The supporting usecase is the Intel i8 x i8 conv. Intel HW supports u8 x i8 conv
in HW. Using this pass, we can replace an i8 x i8 conv to a sequence of
operators where one of the operators is now u8 x i8 conv. This will also help
automatic quantizaion performance.
* Better API name.
* Removing the conv2d legalization for x86. Will send a separate PR.
* Test name changes.
* Registering one funtion to register FTVMLegalize.
* Better comments.