* dd "-allow-unsupported-compiler" flags to Windows CUDA flags
inspired by this PR: https://github.com/microsoft/onnxruntime/pull/21004
* switch to cmake command line
* handle the issues caused by the latest MSVC release
* correct the typo
* correct the parameter
* try one dash again
* use the installed cmake
* use cmake standalone installation firstly
* use the standalone cmake in win32 python too
* fix it more
* one more try
* fix the MacOS pipeline issue
* fix the pip command line
* first draft
* clang
* Draft for ScatterNFOfShape
* fix build
* disable test when cuda is missing
* fix implementation
* update test
* add MaskedScatterNdOfShape
* fix merge conflicts
* first draft for NegXPlus1
* complete
* fix unit test
* rename one test
* remove test if not cuda
* switch to OrtxStatus
---------
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
* first draft for NegXPlus1
* complete
* fix unit test
* rename one test
* remove test if not cuda
---------
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
- update protobuf version being used by sentencepiece and the java tests
- ignore unused language bindings from protobuf and triton
- specify the CG config file with ignored directories where required
Fix cgmanifest.json
- 'git' entries require a commit hash not version
- use 'other' for opencv third party code that is included directly in the opencv repo
- the path isn't a valid repositoryUrl value to be provided as a 'git' entry
- update version numbers/commit hashes to match the latest code
Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
* only keep the image decoder from opencv
* initial build
* refine the code
* Add clear functions
* Update CMakeLists.txt
* Update opencv.cmake
* change the output type to float
* get the result
* align image-process with original Python
* move the LoadRawImages into library
* fix the calculation error
* fix the pipeline build issue
* fix the build breaks in ci pipeline
* support json configuration file and refactor the code.
* Ignore all streaming output of invalid utf-8 string
* Update bpe_streaming.hpp
* add the phi-3 tokenizer test
* add a streaming test for phi-3 model
* fix the utf-8 validation
* fix the utf-8 validation 2
* fix the utf-8 validation 3
* fix the utf-8 validation 4
* initial checkins
* fix the selectedops build failures
* add the tokenization implementation
* update the windows DEF file for c abi in cmake file
* fix the build on linux
* fix some warnings and remove the unused code
* initial import of unit tests from tfmtok
* add streaming API support
* fix the merges loading issues
* complete export from tfmtok - needs input id fixing
* fix the unit test failures.
* fix all unit test failure
* refactor streaming code
* remove the unused code
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* add UT for neg_pos_cuda in eager mode and fix build break in Windows
* fix Linux build break
* adjust argument and path
* remove old cudaContext
* add ort cuda test back
* fix cuda tests
* undo debug code
* undo useless change
---------
Co-authored-by: jslhcl <jslhcl@gmail.com>
Co-authored-by: Cheng Tang <chenta@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
* refactor the header file in include folder
* fix the basic-token eager unit test case
* a more flexible way to handle string tensor shape.
* fix the unit test path issue
* remove the multi-inherits to avoid issue during pointer casting
* add api cmake build support
* undo some temporary changes
* code refinement
* fix variadic arg
* only expose the context for ort version >= 17
* fix a shape bug
* fix the cuda build issue
* change ifdef condition of GetAllocator
* finalize the ort c abi wrapper file name
* fix the iOS build break
* align gtest version with triton
* Update ext_apple_framework.cmake for iOS header files
---------
Co-authored-by: Cheng Tang <chenta@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
This commit updates `HFTokenizerConverter` to handle cases where the `hf_tokenizer` object might not have a `vocab_file` attribute.
Changes:
* Uses `getattr` to retrieve the `vocab_file` attribute for flexibility
* Stores the retrieved value in a separate variable `vocab_file` for clarity
* Checks if `vocab_file` is `None` before checking its existence
This ensures the converter works correctly even with tokenizers that don't define a `vocab_file` attribute.
* unpack all JARs
* fix macos job
* remove -f for macos tree
* move build files from java and linux into macos directory for combined JAR
* test with download and publish pipeline artifacts instead
* use full download task name rather than shortcut
* add job dependencies
* combine JAR packages into one
* update version from version.txt
* change relative path for version.txt
* test
* typo
* Update java_packaging.yml for Azure Pipelines
* Update java_packaging.yml for Azure Pipelines
* test without output variable
* Update java_packaging.yml for Azure Pipelines
* test with type rather than cat command
* Update java_packaging.yml for Azure Pipelines
* Update java_packaging.yml for Azure Pipelines
* Update java_packaging.yml for Azure Pipelines
* Update java_packaging.yml for Azure Pipelines
* Update java_packaging.yml for Azure Pipelines
* set version in each job
* Update java_packaging.yml for Azure Pipelines
* Update java_packaging.yml for Azure Pipelines
* add back dependencies
* final
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* Unify the spm/bpe tokenizers
* fix the build error
* fix the decoding issue
* add model name in exported onnx
* fixing the unit tests
* revert the unneccesary file format changes