История

Yizhi Liu eb3a7382d2 [Arm] parallel batch axis (#3931 ) * support LLVM trunk * guard with USE_LLVM in if condition for c++14 * GREATER_EQUAL -> GREATER * [Arm] parallel batch axis		2019-09-11 11:10:47 -07:00
..
include/topi	Numpy compatible dtype inference for `tvm.convert` and `tvm.const` (#3861 )	2019-09-10 01:26:34 +08:00
python	[Arm] parallel batch axis (#3931 )	2019-09-11 11:10:47 -07:00
recipe	Passing dilation argument to account for API change. (#3510 )	2019-07-08 17:22:01 -07:00
src	[Relay/TOPI][Op] Add erf intrinsic and op (#3702 )	2019-09-09 22:54:15 +08:00
tests/python	[Relay/TOPI][Op] Add erf intrinsic and op (#3702 )	2019-09-09 22:54:15 +08:00
README.md	[HEADER] Add Header to Comply with ASF Release Policy (#2982 )	2019-04-07 21:14:02 -07:00

README.md

TOPI: TVM Operator Inventory

TOPI is the operator collection library for TVM intended at sharing the effort of crafting and optimizing tvm generated kernels. The goal:

Provide sugars for operator declaration
Give common primitives for fused op creation.
Provide commonly used schedules under each architectures

Organization

include C++ library, header only
python python library
recipe Recipe collections containing useful operator examples.

Guidelines

Use numpy-style naming convention for known ops
Seperate operator declaration from schedule when possible.
- This can be inconvenient but enables more general scheduling across ops.
- We can always recover the tensors from its outputs by traversing the tree.
Deliberately assert the requirements
- Some kernels have requirements on shape and data layout, assert them
Data layout aware, if not specified in argument or in function, assume NCHW by default.

Testcase

Add testcases to testout the schedule and dataflow in the TOPI workflow
Only do correctness testing without attaching compiler flags and only run it once.

Performance Tuning Workflow

Since TVM is work in progress, some optimization might not be perfect. One quick way I find useful is to do codegen plus manual modification. The workflow is:

Generate the GPU kernels, write them into a file, say perf/matexp_generated.cu
Copy the generated file into another one, say perf/matexp_manual.cu, do modifications according to your intuition.
Set use_manual flag in the script to continue the codegen workflow as normal, but piggy back the manual written code instead.
Observe the performance difference.
If the performance improves, mark the manual code and think of optimization pass to generate the desired target code.