cffb4fba03
* [HEADER] ASF header dir=include * [HEADER] ASF Header dir=src * [HEADER] ASF Header -dir=python * [HEADER] ASF header dir=topi * [HEADER] ASF Header dir=nnvm * [HEADER] ASF Header -dir=tutorials * [HEADER] ASF Header dir=tests * [HEADER] ASF Header -dir=docker * fix whitespace * [HEADER] ASF Header -dir=jvm * [HEADER] ASF Header -dir=web * [HEADER] ASF Header --dir=apps * [HEADER] ASF Header --dir=vta * [HEADER] ASF Header -dir=go * temp * [HEADER] ASF Header --dir=rust * [HEADER] Add ASF Header --dir=cmake * [HEADER] ASF Header --dir=docs * [HEADER] Header for Jenkinsfile * [HEADER] ASF Header to toml and md * [HEADER] ASF Header to gradle * Finalize rat cleanup * Fix permission * Fix java test * temporary remove nnvm onnx test |
||
---|---|---|
.. | ||
include/topi | ||
python | ||
recipe | ||
src | ||
tests/python | ||
README.md |
README.md
TOPI: TVM Operator Inventory
TOPI is the operator collection library for TVM intended at sharing the effort of crafting and optimizing tvm generated kernels. The goal:
- Provide sugars for operator declaration
- Give common primitives for fused op creation.
- Provide commonly used schedules under each architectures
Organization
- include C++ library, header only
- python python library
- recipe Recipe collections containing useful operator examples.
Guidelines
- Use numpy-style naming convention for known ops
- Seperate operator declaration from schedule when possible.
- This can be inconvenient but enables more general scheduling across ops.
- We can always recover the tensors from its outputs by traversing the tree.
- Deliberately assert the requirements
- Some kernels have requirements on shape and data layout, assert them
- Data layout aware, if not specified in argument or in function, assume NCHW by default.
Testcase
- Add testcases to testout the schedule and dataflow in the TOPI workflow
- Only do correctness testing without attaching compiler flags and only run it once.
Performance Tuning Workflow
Since TVM is work in progress, some optimization might not be perfect. One quick way I find useful is to do codegen plus manual modification. The workflow is:
- Generate the GPU kernels, write them into a file, say
perf/matexp_generated.cu
- Copy the generated file into another one, say
perf/matexp_manual.cu
, do modifications according to your intuition. - Set use_manual flag in the script to continue the codegen workflow as normal, but piggy back the manual written code instead.
- Observe the performance difference.
- If the performance improves, mark the manual code and think of optimization pass to generate the desired target code.