onnxruntime-tvm/topi
wetliu 989e99e693 [WIP] [TOPI] Depth wise Conv for NHWC (#325)
* rename the nchw and pass the unit test; going to do it for nhwc depthwise

* bug with fusion

* nchw works fine; nhwc float32 problem remains

* still cannot bind them together

* fusion works

* syntax fix

* all bugs fixed; test cases pass

* minor fix on nn.h
2017-08-15 17:10:05 -07:00
..
include/topi [WIP] [TOPI] Depth wise Conv for NHWC (#325) 2017-08-15 17:10:05 -07:00
python/topi [WIP] [TOPI] Depth wise Conv for NHWC (#325) 2017-08-15 17:10:05 -07:00
recipe [WIP] [TOPI] Depth wise Conv for NHWC (#325) 2017-08-15 17:10:05 -07:00
tests/python [WIP] [TOPI] Depth wise Conv for NHWC (#325) 2017-08-15 17:10:05 -07:00
README.md [DOC] Document update (#329) 2017-08-14 22:13:28 -07:00

README.md

TOPI: TVM Operator Inventory

TOPI is the operator collection library for TVM intended at sharing the effort of crafting and optimizing tvm generated kernels. The goal:

  • Provide sugars for operator declaration
  • Give common primitives for fused op creation.
  • Provide commonly used schedules under each architectures

Organization

  • include C++ library, header only
  • python python library
  • recipe Recipe collections containing useful operator examples.

Guidelines

  • Use numpy-style naming convention for known ops
  • Seperate operator declaration from schedule when possible.
    • This can be inconvenient but enables more general scheduling across ops.
    • We can always recover the tensors from its outputs by traversing the tree.
  • Deliberately assert the requirements
    • Some kernels have requirements on shape and data layout, assert them
  • Data layout aware, if not specified in argument or in function, assume NCHW by default.

Testcase

  • Add testcases to testout the schedule and dataflow in the TOPI workflow
  • Only do correctness testing without attaching compiler flags and only run it once.

Performance Tuning Workflow

Since TVM is work in progress, some optimization might not be perfect. One quick way I find useful is to do codegen plus manual modification. The workflow is:

  • Generate the GPU kernels, write them into a file, say perf/matexp_generated.cu
  • Copy the generated file into another one, say perf/matexp_manual.cu, do modifications according to your intuition.
  • Set use_manual flag in the script to continue the codegen workflow as normal, but piggy back the manual written code instead.
  • Observe the performance difference.
  • If the performance improves, mark the manual code and think of optimization pass to generate the desired target code.