* Unify the image operations in extensions library
* fix the build configuration issue
* More build fixings
* Fix the native image codec
* fix encode_image
* Add bgr/rgb conversion for encoding image
* parity check
* build break
* update PNG encoding parameters
* build break on Linux
* using MSE to compare images
* fix the discrependency between Linux and Windows
* final code refinement
* one more change
* fix the C++ warnings
---------
Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
* Add general regex support
* add case 5 support instead of replacing with s+
* add more test cases
* address comments
* add back gpt2 and llama regex methods for efficiency
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* add compatibility docs
continue updating the doc
updating doc 2
* support sentence-piece add_dummy_prefix for all models
* revert the flag
* initialize the add_dummy_prefx for llama model
* initial commit
* Ugm vocab loaded is good
* test passed
* fixes unit test on win32
* finish the parity check
* code refinement
* code refinement for review
* add C++ standard library regex support for GPT2 case
* reorder regex handling
* try without STL
* missing case
* add llama3 regex support
* add custom regex impl
* change regex based on model
* modify tests, add docs, and code cleanup
* add regex test and const strings
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* optimize the tokenizer for efficiency
* fix the unit test failures.
* fix the api test case failures
* removed the unused code.
* More test cases fixings
* One more fixing
* fix macOS build issues
* refine the test
* add more diagnosis info.
* fix unit test in CI Linux
* fix the pp_api test failure
* initial api for tokenizer
* More fixings and test data refinement
* add a simple wrapper for pre-processing APIs
* fix the test issues
* test if the tokenizer is spm based
* fix the failed test cases
* json pointer does not work
* Feature extraction C API for whipser model
* Update the docs
* Update the docs2
* refine the code
* fix some issues
* fix the Linux build
* fix more data consistency issue
* More code refinements
* first draft
* clang
* Draft for ScatterNFOfShape
* fix build
* disable test when cuda is missing
* fix implementation
* update test
* add MaskedScatterNdOfShape
* fix merge conflicts
* first draft for NegXPlus1
* complete
* fix unit test
* rename one test
* remove test if not cuda
* switch to OrtxStatus
---------
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
* first draft for NegXPlus1
* complete
* fix unit test
* rename one test
* remove test if not cuda
---------
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
* only keep the image decoder from opencv
* initial build
* refine the code
* Add clear functions
* Update CMakeLists.txt
* Update opencv.cmake
* change the output type to float
* get the result
* align image-process with original Python
* move the LoadRawImages into library
* fix the calculation error
* fix the pipeline build issue
* fix the build breaks in ci pipeline
* support json configuration file and refactor the code.
* Ignore all streaming output of invalid utf-8 string
* Update bpe_streaming.hpp
* add the phi-3 tokenizer test
* add a streaming test for phi-3 model
* fix the utf-8 validation
* fix the utf-8 validation 2
* fix the utf-8 validation 3
* fix the utf-8 validation 4
* initial checkins
* fix the selectedops build failures
* add the tokenization implementation
* update the windows DEF file for c abi in cmake file
* fix the build on linux
* fix some warnings and remove the unused code
* initial import of unit tests from tfmtok
* add streaming API support
* fix the merges loading issues
* complete export from tfmtok - needs input id fixing
* fix the unit test failures.
* fix all unit test failure
* refactor streaming code
* remove the unused code
---------
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
* add UT for neg_pos_cuda in eager mode and fix build break in Windows
* fix Linux build break
* adjust argument and path
* remove old cudaContext
* add ort cuda test back
* fix cuda tests
* undo debug code
* undo useless change
---------
Co-authored-by: jslhcl <jslhcl@gmail.com>
Co-authored-by: Cheng Tang <chenta@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
* refactor the header file in include folder
* fix the basic-token eager unit test case
* a more flexible way to handle string tensor shape.
* fix the unit test path issue
* remove the multi-inherits to avoid issue during pointer casting
* add api cmake build support
* undo some temporary changes
* code refinement
* fix variadic arg
* only expose the context for ort version >= 17
* fix a shape bug
* fix the cuda build issue
* change ifdef condition of GetAllocator
* finalize the ort c abi wrapper file name
* fix the iOS build break
* align gtest version with triton
* Update ext_apple_framework.cmake for iOS header files
---------
Co-authored-by: Cheng Tang <chenta@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
* Unify the spm/bpe tokenizers
* fix the build error
* fix the decoding issue
* add model name in exported onnx
* fixing the unit tests
* revert the unneccesary file format changes
* support float16
* add ut for float16
* support bfloat16
* refactor
* fetch prop
* ifdef f16
* remove header
* ifdef cuda
* typename mapped type
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* cmake fast gelu
* bridge func and cuda kernel
* tune ut
* fix build warning
* fix format
* tune ut
* drop OCOS_ENABLE_CONTRIB
* tune cmake
---------
Co-authored-by: Randy Shuai <rashuai@microsoft.com>