* support arbirtary pointer levels for declared types
* handle logical_type=element
* arg values
* refactor (func name, arg_info) to FunctionInfo
* fix circular imports
* .
* support dylib
* update verify_hat_package
* verify -> verify_args
* fixes
* fix verify_args
* formatting
* add verify hat test
* refactor
* handle ndarrays as arguments
* cleanup
* runtime_array verify test
* print output
* basic test passing
* Update test_create_simple_hat_file.py
* Update test_create_simple_hat_file.py
* comments
* comments
* TODOs
* nfc
* [test] moved creation to workdir
* rename
* Print output dimension references and clarify HAT schema (#66)
* infer shapes from size, add shape order requirement
* merged
* pretty print using cross references
Co-authored-by: Lisa Ong <onglisa@microsoft.com>
* revert formatting changes
* revert more formatting only changes
* simplify
* Add input and input/output runtime_array support (#67)
* wip
* scaffold
* scaffold and initial support for input elements and input/output runtime_arrays
* .
* fixups
* don't swallow exceptions
* support cargs for non pointer args
* cleanup
* refactor
* support integer-like types when checking constant shapes
* nfc
Co-authored-by: Lisa Ong <onglisa@microsoft.com>
* test coverage for usage type Input and InputOutput
* [test] Support windows in verify_hat tests (#69)
* wip
* build for windows (#68)
Co-authored-by: Lisa Ong <onglisa@microsoft.com>
* windows tomlkit expects lists
* manual CI trigger
Co-authored-by: Lisa Ong <onglisa@microsoft.com>
* fix windows test
* fix logic and add comment
* .
* verify_args -> verify
* args -> arguments
* PR feedback
Co-authored-by: Lisa Ong <onglisa@microsoft.com>
- Instantiate more default objects for simple cases
- Enable modifying the list of functions in the hat file
- Make hat file functions and function_map wrap the same underlying
object
We've been taking a product of strides and multiplying that by the major
dimension, but we should be just taking the largest stride and
multiplying that by the major dimension, as the largest stride
already factors in the other strides in the array shape.
* avoid printing null bytes in hiprtcGetProgramLog
* check value in the temp fix
* remove redudant log printing
Co-authored-by: Lisa Ong <onglisa@microsoft.com>
* nfc
* shadowed variable time in python 3.7
* invoke perf_counter
* nfc
* add more checks now that exceptions are swallowed
* more stringent checks for benchmark tests
Co-authored-by: Lisa Ong <onglisa@microsoft.com>
* Removes the usage of streams for kernel launches.
* Adds clean up code to make sure we free up any allocated memory in case of failures
* CUDA and ROCm versions of `CallableFunc` now reset device state better during runtime init