* [VTA] Support TLPP in function simulator.
Issue:
currently vta function simulator just doing serialized instruction
execution, the dependency logic of runtime ISA which use for task
level pipe line parallelism can not get verified by function simulator.
Solution:
make the simulator driver to be multiple thread and support TLPP.
Benefit:
TLPP support VTA function simulator would make VTA logic testing/debug
/change more easy.
replace boost lockfree queue
add configure control for simulator tlpp enable or disable.
change code tyle into google style.
Wrap queue read/write and sync logic to make function call more simple.
Add some comments.
Remove MT logic, change into Single thread mode.
address review comments.
code style change to match google code style and add comments.
add cmake macro to enable/disable simulator tlpp logic.
submodule update.
correct file name mentioned in comments.
* remove USE_VTA_FSIM_TLPP.
* adding support for graphpack over multiply op
* increasing resnet model coverage
* fix indentation
* lint
* moving recursion limit fix into graphpack pass
* moving recursionlimit to relay init
* pooling on NCHWnc format
* adding more models
* deploy_resnet_on_vta.py
* trailing line
* generalizing to vision models
* merge conflicts
* fix, apply quantization to VTA only
* improving comments
* trimming models that have runtime issues for the moment
* lint
* lint
* lint
* [VTA][Chisel] add scalafmt and format existing scala codebase
* change column width to 100
* add scalafmt conf file as a valid file type
* add asf header to scalafmt conf file and rerun formatter
Issue:
RewriteForceSerial is a debug function to force instructions
to be serialize instead of parrallel running, by doing so we
can isolate some parallel problem or do performance compare
between parallel and serialize. But this function have some
problem, once get enabled by set debug flag, vta would stuck
when running on pynq board.
Analysis:
once enable RewriteForceSerial, the dependency logic is different
with default one, but we still use same logic to generate FINISH
and other logic, this would cause dead lock.
Solution:
give a different dependency settings when enable RewriteForceSerial.
* initial virtual memory;
* initial integration;
* include the header file in cmake;
* implement allocation with virtual to logical address mapping;
* virtual memory for tsim_driver;
* implement the missing memory release function;
* readability improvement;
* readability improvement;
* address review comments;
* improved robustness in virtual memory allocation;
* remove VTA_TSIM_USE_VIRTUAL_MEMORY macro and use virtual memory for tsim by default;
* link tvm against vta library;
* merge with master
* build virtual memory system without linking tvm against vta;
* minor change;
* reuse VTA_PAGE_BYTES;
* using DRAM class from sim_driver as VirtualMemoryManager;
* satisfy linter;
* add comments in code;
* undo changes to Makefile
* undo changes to Makefile
* retrigger ci;
* retrigger ci;
* directly call into VirtualMemoryManager::Global()
* building TSIM specific library along with fast simulator to quickly switch between dlls
* cmake controlled TSIM libraries
* always build tsim driver in either simulation modes
* build DLLs based on CMAKE flags
* updating the jenkinsfile
* small restructuring
* reducing the cmake flags
* update instructions
* reverting to 3 flags
* update Jenkinsfile
* adding new line
* enabling TSIM unit and integration tests
* fix description
* temporarily disabling task_python_vta tests in CPU Build stage
* move CPU tests in unit test stage
* stage reorg
* better make
* disabling TSIM tests for now
* reverting some restructuring
* fix
* added alutest
* fix indent
* name change for cycle
* improved data gen and infra
* added alutest
* fix indent
* name change for cycle
* improved data gen and infra
* fix space
* fix indent
* fixes
* aluRef
* fix randomarary
* add
* Revert "add"
This reverts commit 87077daebbe055dee11f80e37da3a6291138e0f0.
* Revert "fix randomarary"
This reverts commit df386c1e660eb6ebcff1a1f905610573676f1589.
* Revert "aluRef"
This reverts commit 8665f0d4a7b12b796b2cb1ca6bf9cfe5613ee389.
* should fix dlmc-core
* initial compilation script for chisel-vta;
* replace tabs with spaces;
* compile script for de10-nano;
* remove generated verilog source code;
* remove `altsource_probe`, `debounce`, `edge_detect` ip;
* replace quartus project files with a single tcl script;
* Update install.md
* improved makefile-based compilation script;
* complete makefile-based compilation of chisel-vta for de10-nano;
* install quartus;
* conversion to .rbf file;
* document chisel-vta compilation process for de10-nano;
* rename generated bitstream file;
* download and extract custom ip for de10-nano;
* minor change
* minor change
* fix indentation;
* bug fix;
* improved robustness in makefile;
* clean up;
* add `.sdc .ipx .qsys` allowance in jenkins;
* add ASF header;
* add ASF header;
* remove IntelShell.scala, update vta_hw.tcl, clean up Makefile & soc_system.qsys;
* add ASF header;
* keep sources compact;
* keep sources compact;
* it's not necessary now
* AXI4LiteClient -> AXI3Client for IntelShell
* remove connection to fpga_only_master;
* a few important bug fix: wire reset pin, and set host_r_last to high
* remove intel specific interface definition;
* add NO_DSP option in Makefile;
* AXI4Lite is not used in IntelShell;
* minor fix: disable dsp and use logic instead;
* quartus version change: 18.0 -> 18.1
* remove altera related statement;
* compose compile_design.tcl
* initial tcl script for soc_system generation;
* remove .qsys file;
* remove unused;
* .qsys can be generated by tcl script;
* remove hps_io and shrink size of soc_system;
* integrate into makefile;
* version change: 18.0 -> 18.1
* add sample config file for de10-nano;
* parameterize DEVICE and PROJECT_NAME
* remove extra lines;
* brief description on flashing sd card image for de10-nano
* docs on building additional components
* parameterize DEVICE and DEVICE_FAMILY
* parameterize DEVICE and DEVICE_FAMILY
* parameterize DEVICE and DEVICE_FAMILY
* de10-nano -> de10nano
* minor change
* add comment in code and document in order to address review comments;
* fix in IR pass to support padding on 6-d tensors
* support for both N>1 and N==1 for padding
* batch size > 1 tuning and base config
* output formatting
* batch conv2d
* print all category results
* revert to single-batch config
* pick record best
* fix conv test
* improving reporting
* address batching bug in fast simulator
* fix
* hardware refactor for increased FPGA coverage, small optimizations
* fix header
* cleaning up parameters that won't be needed for now
* streamlining makefile, and simplifying tcl scripts
* moving parameter derivation into pkg_config.py, keeping tcl scripts lightweight
* refactoring tcl script to avoid global variables
* deriving AXI signals in pkg_config.py
* unifying address map definition for hardware and software drivers
* single channel design for ultra96 to simplify build
* enable alu by default, no mul opcode for now
* hardware fix
* new bitstream; vta version
* avoid error when env variable is not set
* ultra96 cleanup
* further cleaning up tcl script for bitstream generation
* preliminary rpc server support on ultra96
* rpc server tracker scripts
* ultra96 ldflag
* ultra96 support
* ultra96 support
* cleanup line
* cmake support for ultra96
* simplify memory instantiation
* cleaning up IP parameter initialization
* fix queue instantiation
* 2019.1 transition
* fix macro def
* removing bus width from config
* cleanup
* fix
* turning off testing for now
* cleanup ultra96 ps insantiation
* minor refactor
* adding comments
* upgrading to tophub v0.6
* model used in TVM target now refers to a specific version of VTA for better autoTVM scheduling
* revert change due to bug
* rename driver files to be for zynq-type devices
* streamlining address mapping
* unifying register map offset values between driver and hardware generator
* rely on cma library for cache flush/invalidation
* coherence management
* not make buffer packing depend on data types that can be wider than 64bits
* refactor config derivation to minimize free parameters
* fix environment/pkg config interaction
* adding cfg dump property to pkgconfig:
* fix rpc reconfig
* fix spacing
* cleanup
* fix spacing
* long line fix
* fix spacing and lint
* fix line length
* cmake fix
* environment fix
* renaming after pynq since the driver stack relies on the pynq library - see pynq.io
* update doc
* adding parameterization to name
* space
* removing reg width
* vta RPC
* update doc on how to edit vta_config.json
* fix path
* fix path
* support for different inp/wgt bits, rewrote dot for clarity
* [VTA] [Chisel] support for different inp/wgt bits, rewrote DotProduct for clarity
* [VTA] [Chisel] support for different inp/wgt bits, rewrote DotProduct for clarity
* change back to sim
* fix index
* fix index
* fix indent
* fix indent
* fix indent
* fix trailing spaces
* fix trailing spaces
* change to more descriptive name
* matric->matrix
* fix spacing
* fix spacing & added generic name for dot
* better parameter flow
* spacing
* spacing
* spacing
* update requirement (tested) for dot, spacing
* function call convention
* small edit
This appears in linting using the docker scripts. I'm not sure
why this isn't failing in the standard CI for TVM and it might
be that the docker images haven't been updated in the CI system.
python3 -m pylint vta/python/vta --rcfile=/workspace/tests/lint/pylintrc
Using config file /workspace/tests/lint/pylintrc
************* Module vta.top.graphpack
C:131, 4: Missing method docstring (missing-docstring)
* add tsim init function
* add sim device
* test wait and resume
* launch simulation thread from DPILoader
* add VTASimDPI module to handle all simulation related stuff
* test tsim init
* move exit to simdpi module
* update vta driver
* add chisel DPI module
* get back simshell
* update vta to support dpi sim
* update unittests
* add tsim to integration-conv2d test
* run resnet on tsim
* remove max-cycles
* match tsim counters with sim counters
* use env in simulator to switch between sim and tsim
* update unittest
* rollback conv2d test
* update resnet
* add stats to matrix multiply
* add stats
* print stats after assert
* update other tests
* add stats to gemm
* add return and remove unused libs
* add missing arg
* return lib
* update comments for linter
* add more comments to VTASimDPI module
* remove trailing spaces
* remove trailing spaces
[Symptom]
after follow the tsim example readme, doing verilator install by 'sudo apt-get-install verilator'
Once enable 'debug' or manually add 'printf' logic in chisel module, verilator would report
following error.
'syntax error, unexpected INTEGER NUMBER, expecting IDENTIFIER'
[Fix]
upgrade verilator to 4.012, issue fixed.
[Solution]
Link README.md verilator install steps with verilator home website
install instruction.
* [VTA] Add VTA PYNQ metal_test bitstream program logic and fix couple compile issue.
Issue:
VTAProgram not exist and cause compile error.
No logic to program the bitstream into FPGA.
metal test still use pynq 2.1 library which not support on latest
pynq 2.4.
Solution:
remove old VTAProgram.
when setting is pynq, program the bitstream during compile.
change DMA link library to libcma.
* Address review commends.
Issue:
when using vivado compile vta.cc with top function 'vta', vivado
report deadlock error like '...with default size is used in a non -dataflow
region, which may result in deadlock Please consider to resize the
stream using the directive ‘set_directive_stream’ or the ‘HL S stream’
pragma.'
Solution:
give the queue a default size as 8.