* adding support for graphpack over multiply op
* increasing resnet model coverage
* fix indentation
* lint
* moving recursion limit fix into graphpack pass
* moving recursionlimit to relay init
* pooling on NCHWnc format
* adding more models
* deploy_resnet_on_vta.py
* trailing line
* generalizing to vision models
* merge conflicts
* fix, apply quantization to VTA only
* improving comments
* trimming models that have runtime issues for the moment
* lint
* lint
* lint
* fix in IR pass to support padding on 6-d tensors
* support for both N>1 and N==1 for padding
* batch size > 1 tuning and base config
* output formatting
* batch conv2d
* print all category results
* revert to single-batch config
* pick record best
* fix conv test
* improving reporting
* address batching bug in fast simulator
* fix
* hardware refactor for increased FPGA coverage, small optimizations
* fix header
* cleaning up parameters that won't be needed for now
* streamlining makefile, and simplifying tcl scripts
* moving parameter derivation into pkg_config.py, keeping tcl scripts lightweight
* refactoring tcl script to avoid global variables
* deriving AXI signals in pkg_config.py
* unifying address map definition for hardware and software drivers
* single channel design for ultra96 to simplify build
* enable alu by default, no mul opcode for now
* hardware fix
* new bitstream; vta version
* avoid error when env variable is not set
* ultra96 cleanup
* further cleaning up tcl script for bitstream generation
* preliminary rpc server support on ultra96
* rpc server tracker scripts
* ultra96 ldflag
* ultra96 support
* ultra96 support
* cleanup line
* cmake support for ultra96
* simplify memory instantiation
* cleaning up IP parameter initialization
* fix queue instantiation
* 2019.1 transition
* fix macro def
* removing bus width from config
* cleanup
* fix
* turning off testing for now
* cleanup ultra96 ps insantiation
* minor refactor
* adding comments
* upgrading to tophub v0.6
* model used in TVM target now refers to a specific version of VTA for better autoTVM scheduling
* revert change due to bug
* rename driver files to be for zynq-type devices
* streamlining address mapping
* unifying register map offset values between driver and hardware generator
* rely on cma library for cache flush/invalidation
* coherence management
* not make buffer packing depend on data types that can be wider than 64bits
* refactor config derivation to minimize free parameters
* fix environment/pkg config interaction
* adding cfg dump property to pkgconfig:
* fix rpc reconfig
* fix spacing
* cleanup
* fix spacing
* long line fix
* fix spacing and lint
* fix line length
* cmake fix
* environment fix
* renaming after pynq since the driver stack relies on the pynq library - see pynq.io
* update doc
* adding parameterization to name
* space
* removing reg width
* vta RPC
* update doc on how to edit vta_config.json
* fix path
* fix path
* add tsim init function
* add sim device
* test wait and resume
* launch simulation thread from DPILoader
* add VTASimDPI module to handle all simulation related stuff
* test tsim init
* move exit to simdpi module
* update vta driver
* add chisel DPI module
* get back simshell
* update vta to support dpi sim
* update unittests
* add tsim to integration-conv2d test
* run resnet on tsim
* remove max-cycles
* match tsim counters with sim counters
* use env in simulator to switch between sim and tsim
* update unittest
* rollback conv2d test
* update resnet
* add stats to matrix multiply
* add stats
* print stats after assert
* update other tests
* add stats to gemm
* add return and remove unused libs
* add missing arg
* return lib
* update comments for linter
* add more comments to VTASimDPI module
* remove trailing spaces
* remove trailing spaces