* Fix CodeGenTokenizer issues and the related code refactoring.
* refactor the trie-tree
* temp check-ins
* code complete
* correctness fixing
* Update _hf_cvt.py
* more test cases fixing
* more refinement
* linux crash fixing
* Update test_autotokenizer.py
* Refactor String and Audio operators with status-return prototype.
* complete the whole text domain
---------
Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
* Add TrieTokenizer for RWKV-like LLM models
* add more tests
* fix the windows build
* downloading file instead of check in the vocab file
* a small bug fixing
* Nodes can be called concurrently and Compute needs to be stateless due to that.
Update the kernels to make Compute const.
* Fix test that uses ustring.h.
Would be better to not have duplicate declarations for GetTensorMutableDataString and FillTensorDataString in ustring.h and string_tensor.h.
* add perf changes for CLIP and Roberta
* add perf improvement for BERT
* remove global var
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* use lite custom op api for math
* add vision ops
* add cx2 ops
* remove useless code
* support register custom kernel struct
* add string tensor support
* add more text kernels
* fix issue with std stringg as scalar
* migrate all text ops
* initial tokenizer change
* migrate all tokenizers
* Resolve conflict with main (#433)
* resolve conflict
* resolve conflict
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* Update custom-op-lite PR (#440)
* add the onnxruntime 1.14 release into the CI pipeline (#387)
* add the onnxruntime 1.14 release into the CI pipeline
* torch 2.0 crashed on Linux
* Fix size_t overflow issue for RobertaTokenizer (#388)
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* Pre and Post processing example for openAI Whisper model (#380)
* add a stft-norm custom op for log-mel spectrum.
* undo the debug change
* Support ONNX standard STFT op signature.
* Add a unit test onnx STFT compatible mode.
* add whisper pre-/post- processing example
* Update dlib.cmake
* undo test code changes
* Update setup.cfg
* update the end2end example with STFT op
* Added optional outputs for GPT2, CLIP and Roberta Tokenizers (#389)
* Initial optional i/o for robertap
* Small fix
* Added working optional output functionality to RobertaTokenizer with tests
* Added optional outputs to CLIPTokenizer
* Added optional outputs to GPT2Tokenizer
* Use ternary operators
---------
Authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* ignore the unknown token id on bpe deocder (#391)
* Use dependency name 'nlohmann_json' which is the same name that ORT uses. (#393)
* Add an audio decoder custom op for whisper end-to-end processing (#385)
* evaluate the audio decoder library
* MP3 Decoder
* rename it to test_audio_codec
* add the audio decoder to whisper model
* whisper end-to-end draft
* fix the mp3 decoder
* Running with ONNX models
* Add more audio format supports
* refine the end-to-end script
* Update operators/audio/audio_decoder.hpp
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* Update operators/audio/audio_decoder.hpp
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* Update operators/audio/audio_decoder.hpp
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* some fixings of comments and more test cases.
* changes for review comments.
* Update audio_decoder.hpp
* Update audio_decoder.hpp
* code refinement
* Update operators/audio/audio_decoder.hpp
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
---------
Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* make tensorflow be optional for unittest (#394)
* make tensorflow be optional for unitest.
* typo
* built-in bounding box op (#382)
* built-in bounding box op
* update boundary check
* assert policy
* more boundary test and check
* XYXY--> X horizon
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>
* a quick nuget package impl. (#396)
* Update wheels_linux.yml: change the linux machine pool name (#398)
* Add a merge step in whisper end-to-end script and fixed some issues (#399)
* add merged models in whisper model
* verify the final model
* support batch > 1 in BpeDecoder (#400)
* support batch > 1 in BpeDecoder
* update the shape in helper function
* [object detection ppp] YoLo as example (#397)
* object detection
* Unit test
add e2e fastestdet model test
---------
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
* some fixing for python package (#401)
* more code fixing related whisper models (#403)
* Added windows nuget work temporarily for testing (#402)
* Added windows nuget work temporarily for testing
* Cleanup
* Add back onnxruntime.lib in props file for possible future ORT need
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
* Remove unnecessary nupkg file and update nuspec (#405)
* Add nuget pack to build.bat and small nuget changes for demo
* Temporarily adding nuget.exe to build package until we can add to CI machine
* Switch back from Release to RelWithDebInfo
* Remove unnecessary changes
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* Add initial NuGet pipeline for Windows x64 build (#406)
* initial nuget pipeline
* Update nuget.yml for Azure Pipelines
* update nuget.yml for extensions specific packaging
TODO: add certain template yml files
* added component governance template yaml
* change template yaml path
* remove RoslynAnalyzers
* Add packDestination to nuget pack task (change from default)
* fix nuspec path
* Update nuget.yml for Azure Pipelines
* Update nuget.yml for Azure Pipelines
* Update nuget.yml for Azure Pipelines
* Update 2 nuget.yml for Azure Pipelines
* Update NativeNuget.nuspec
* Update nuget.yml for Azure Pipelines
* update nuspec
* Update 3 nuget.yml for Azure Pipelines
* Update 4 nuget.yml for Azure Pipelines
* Update 7 nuget.yml for Azure Pipelines
* Remove unnecessary nupkg file and update nuspec (#405)
* Add nuget pack to build.bat and small nuget changes for demo
* Temporarily adding nuget.exe to build package until we can add to CI machine
* Switch back from Release to RelWithDebInfo
* Remove unnecessary changes
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* Update 8 nuget.yml for Azure Pipelines
* Update 9 nuget.yml for Azure Pipelines
* add DLL signing
* Update nuget.yml for Azure Pipelines
* fix indendation
* Update 11 nuget.yml for Azure Pipelines
* Update 12 nuget.yml for Azure Pipelines
* Update 12 nuget.yml for Azure Pipelines
* Revert some unneccesary changes on nuget.yml
* clean up nuget.yml and update nuspec release notes
* small changes
* update commit id and release notes
---------
Co-authored-by: Wenbing Li <wenbingl@outlook.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* Compatible with onnxruntime-gpu package (#410)
* be compatible without onnxruntime-gpu version
* some fixing
* Add nuget README and remove ort lib references from props (#409)
* Add nuget README and remove ort lib references from props
* replace commit id in nuspec dynamically
* remove $ sign for commit id token
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
* Add an C# demo project for NuGet package (#407)
* Add a nuget test app
* remove unused file
* Compatible with onnxruntime-gpu package (#410)
* be compatible without onnxruntime-gpu version
* some fixing
* turn it as a .net demo project
---------
Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
* Make Whisper E2E script more portable (#412)
This PR makes the Whisper E2E script more portable for other environments.
* Update macos wheel timeout to 180 min (#390)
* Update ci timeout to 120 min
* Only update WindowsPython job timeout
* Update ci timeout to 90 min
* update macos wheel timeout to 180 min
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* Fix OneBranch PR pipeline CodeQL issue (#413)
* test codeql 3000
* switch codeql from compiled to python
* switch back to compiled
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* Adding down-sampling and stereo mixing features for AudioDecoder (#420)
* initial draft
* second
* third
* polishing
* fix the M_PI name in LINUX platform
* fix bessel function issue
* add a unit test case
* fix the unit test name
* Fix Secure Supply Chain Analysis Warning in PR pipeline (#414)
* remove package sources
* remove NuGet.config
* add .sscignore for cfs0011
* change sscignore
* add CFS0013 to sscignore
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* fix onnx version to 1.13.1 (#422)
* [NuGet] All platform package pipeline (#408)
* nuget ci package
* disable macos arm64 build for err
* Get the iOS xcframework build working with the split build/pack approach. (#416)
* refine build_xcframework.py
Cleanup/clarify various things
- naming of parameters and files
- consistency
Make handling of additional build args more generic
Update the artifact download dir/extract dir to more intuitive names
Update scripts
- make usage from CI pipeline clearer (e.g. don't hide directory names inside script)
- keep comments in nuspec
- remove unused args
- make additional arg handling more
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* Add new required pre/post processing ops to Android and iOS packages. (#415)
* Revert "Pin onnx version to 1.13.1" (#423)
* Revert "fix onnx version to 1.13.1 (#422)"
This reverts commit eb29d225a7.
* Update requirements.txt
* PyOp attribute supports int and float data type (#425)
* Fix Android AAR in nuget package. Requires libortextensions.so. (#429)
* build for mac M1 (#430)
* Fix the unit test failure with ONNX 1.14 package. (#428)
* Fix the unit test failure with ONNX 1.14 package.
* more tests
* Update whisper_e2e.py
* Add nuget.org publish version option (#426)
* Add nuget.org publish version option
* typo
* small fix
* typo
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* resolve conflict
* resolve conflict
* minor fix
* rename from TensorT to Tensor
* fix string tensor
* Add OrtLiteCustomOp
* switch to string view
* fix regex ops
---------
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: JiCheng <247153481@qq.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Wenbing Li <wenbingl@outlook.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* Fix a build err (#442)
* resolve conflict
* resolve conflict
* minor fix
* rename from TensorT to Tensor
* fix string tensor
* Add OrtLiteCustomOp
* switch to string view
* fix regex ops
* fix build
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* Fix build err on ort 141 (#444)
* resolve conflict
* resolve conflict
* minor fix
* rename from TensorT to Tensor
* fix string tensor
* Add OrtLiteCustomOp
* switch to string view
* fix regex ops
* fix build
* fix a build err
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* Remove shape from span (#445)
* resolve conflict
* resolve conflict
* minor fix
* rename from TensorT to Tensor
* fix string tensor
* Add OrtLiteCustomOp
* switch to string view
* fix regex ops
* fix build
* fix a build err
* remove shape
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* Fix python tests (#446)
* resolve conflict
* resolve conflict
* minor fix
* rename from TensorT to Tensor
* fix string tensor
* Add OrtLiteCustomOp
* switch to string view
* fix regex ops
* fix build
* fix a build err
* remove shape
* fix python tests
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* Fix max build (#449)
* resolve conflict
* resolve conflict
* minor fix
* rename from TensorT to Tensor
* fix string tensor
* Add OrtLiteCustomOp
* switch to string view
* fix regex ops
* fix build
* fix a build err
* remove shape
* fix python tests
* fix packaging err
* fix mac build
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* Fix comments (#452)
* resolve conflict
* resolve conflict
* minor fix
* rename from TensorT to Tensor
* fix string tensor
* Add OrtLiteCustomOp
* switch to string view
* fix regex ops
* fix build
* fix a build err
* remove shape
* fix python tests
* fix packaging err
* fix mac build
* fixing the universal2 python package for macOS (#448)
* Remove onnx<1.14 from requirements.txt (#447)
* remove onnx<1.14 from requirements.txt
* downgrade protobuf
* move protobuf req to requirements-dev.txt
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
* fix comments
* comment version macro
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* Fix build err (#453)
* resolve conflict
* resolve conflict
* minor fix
* rename from TensorT to Tensor
* fix string tensor
* Add OrtLiteCustomOp
* switch to string view
* fix regex ops
* fix build
* fix a build err
* remove shape
* fix python tests
* fix packaging err
* fix mac build
* fix comments
* comment version macro
* define Compute for StftNormal
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* Merge latest main (#461)
* resolve conflict
* resolve conflict
* minor fix
* rename from TensorT to Tensor
* fix string tensor
* Add OrtLiteCustomOp
* switch to string view
* fix regex ops
* fix build
* fix a build err
* remove shape
* fix python tests
* fix packaging err
* fix mac build
* fix comments
* comment version macro
* define Compute for StftNormal
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* revert wanted changes in test
* revert unwanted changed
* add string_strip op
---------
Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: JiCheng <247153481@qq.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Wenbing Li <wenbingl@outlook.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
* evaluate the audio decoder library
* MP3 Decoder
* rename it to test_audio_codec
* add the audio decoder to whisper model
* whisper end-to-end draft
* fix the mp3 decoder
* Running with ONNX models
* Add more audio format supports
* refine the end-to-end script
* Update operators/audio/audio_decoder.hpp
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* Update operators/audio/audio_decoder.hpp
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* Update operators/audio/audio_decoder.hpp
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* some fixings of comments and more test cases.
* changes for review comments.
* Update audio_decoder.hpp
* Update audio_decoder.hpp
* code refinement
* Update operators/audio/audio_decoder.hpp
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
---------
Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* Initial optional i/o for robertap
* Small fix
* Added working optional output functionality to RobertaTokenizer with tests
* Added optional outputs to CLIPTokenizer
* Added optional outputs to GPT2Tokenizer
* Use ternary operators
---------
Authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* initial PR
* add the attributes for op
* cmake update
* add the missing symbol
* add a unit test case
* fix the unit test
* fix some corner case.
* format Python code with autopep8
* fix the break in release pipeline
* code cleanup and the warnings fixing.
* Update ci.yml for Azure Pipelines
* Update ci.yml for Azure Pipelines
* fix linux build
* one more fixing
* again?
* fixing for macOS
* Add ability to prevent exception propagation with top level try/catch hander macros.
If combined build with ORT has exceptions disabled in ORT but ort-ext has an operator that requires exceptions, we enable exceptions in ort-ext but prevent them propagating up via try/catch in the entry points that ORT can call
- RegisterCustomOps
- CustomOpBase constructor and Compute
Removed some places in CustomOpApi that threw is OpKernelInfo* was nullptr but standardizing all kernels to store the OpKernelInfo provided in the ctor.
Added unit tests
- need to validate on more platforms and add CI for build where we don't want to allow exceptions to propagate
* Update pyop
* Update CMakeLists.txt
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* Update includes/exceptions.h
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* Update includes/exceptions.h
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* Update includes/onnxruntime_customop.hpp
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* Merge with main and update
Address PR comments
Fix some issues.
* Delete local file
* Fix pyop update
* Add CI
Address PR comments
---------
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
* Initial CLIP tokenizer implementation
* Moved common code from CLIP and GPT2 tokenizers into separate file
* add the new file into cmake file list.
* Fix ustring reference issue
* merge changes from main branch
* more merge actions
* Minor changes
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
Co-authored-by: Wenbing Li <wenbingl@outlook.com>
* Using the header files from the ONNXRuntime package
* Update includes/onnxruntime_customop.hpp
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* fix the build break.
* one more fixing
* wired top project
* ort 1.9.0 used
* switch to 1.10.0 package.
* change the vmimage to latest
* URL issue
* cmake policy
* ignore onnxruntime.dll native scan
* update the Onebranch exclusedPaths
* fixing some build tool issues
* update again
* typo
* undo of ORT dll removal
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* Implemented a new version of Kernel and the CustomOp to support
output that matches the HuggingFace model's input without the need
for intermediate python logic.
* Implemented a e2e tutorial for exporting and inferencing using the
HuggingFace's QuestionAnsering model.
Known Issue: Python side doesn't have an implementation of Bert Decoder
and so the augmented model is only half-complete. At the time of
inferencing the HuggingFace tokenizer is used to decode the result back
to string.
* Moved a few variables from Kernel implementation to BertTokenizer so
each version of the Kernel doesn't have to deal with them.
* Other decorative and code standardization changes.
not restricted to specific token. Sentencepiece.Encode itself doesn't
clear the input vector before populating the result for the input
token.
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
* Add test cases and fix empty string error in BlingFire sentence breaker.
* Throw error if input text to join is empty array.
* Fix scalar support and access violation.
* Resolve comments.
* Resolve comments.
Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
* add a native unit test for regex_split op
* fix the case of shape [1, 0]
* Update mshost.yaml
* downgrade the test model version.
* upgrade torch version on Windows CI
* disable windows python 3.7 pipeline.
* new CI configuration
* Set up CI with Azure Pipelines
[skip ci]
* install numpy in cibuildwheel
* add pyproject.toml
* upgrade vmImage
* update the build python versions
* remove the pytest
* move the wheel build files
* enable sdist setup.py as well.
* use git command line
* Update wheels.yml for Azure Pipelines
* disable the pypy package for macos;
* fix the external repo code tag
* fix the ctest problem
* fix the unicode 8217.
* fix the locale base test