ort-customops

Граф коммитов

Автор	SHA1	Сообщение	Дата
Wenbing Li	c3379ecb6b	fix the build for mobile packaging (#843 ) * fix the build for mobile packaging * update the cmake file as well * more fixing on dlib related ops * release the iOS cmake version constraint * upgrade cmake in Linux CUDA build * Update Dockerfile.ubuntu_cuda11_8_tensorrt8_6 for typo * Update ios_packaging.yml for Azure Pipelines * update the dlib versoin * update all cases of cmake version * update the comment for dlb cmake	2024-11-17 20:09:36 -08:00
Wenbing Li	5104bb9897	fix the win32 macro usage (#844 )	2024-11-15 11:26:37 -08:00
Wenbing Li	3da0d3c929	Load the tokenizer data from the memory (#836 )	2024-11-09 10:15:21 -08:00
Kyle	14f280adf6	Change Pipeline's Service Connection Name (#841 ) * change service connection name	2024-11-08 11:39:37 +08:00
Kyle	ece1db2dc7	Migrate Pipeline to 1ES PT - wheels_macos (#840 ) migrate pipeline.	2024-11-07 12:57:16 +08:00
Kyle	aabc4030f0	Upgrade Pipeline Python Version to 3.12 (#839 )	2024-11-05 09:24:43 -08:00
Kyle	31056e7d4f	Migrate Pipelines - Phase 1 - Five Pipelines and Templates (#838 ) migrate pipelines	2024-11-05 11:38:23 +08:00
Sayan Shaw	5b7e3d4b8b	Fix prefast issue in image transforms (#837 ) * fix prefast issue in image transforms * Update image_transforms.hpp * Update image_transforms.hpp --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com> Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>	2024-10-31 15:10:47 -07:00
Wenbing Li	be5aa773e3	Unify the image operations in extensions library (#831 ) * Unify the image operations in extensions library * fix the build configuration issue * More build fixings * Fix the native image codec * fix encode_image * Add bgr/rgb conversion for encoding image * parity check * build break * update PNG encoding parameters * build break on Linux * using MSE to compare images * fix the discrependency between Linux and Windows * final code refinement * one more change * fix the C++ warnings --------- Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>	2024-10-30 09:17:06 -07:00
Sayan Shaw	0e6bffa201	Fix regex prefast warnings (#832 ) * fix regex prefast warnings * remove try catch --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>	2024-10-29 22:36:59 -07:00
Sayan Shaw	f12431a211	Upgrade versions in CI matrix and fix CI issue (#835 ) * upgrade ci matrix * typo * revert python version * update python and ort range * update python range * update for macos and linux too --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>	2024-10-29 19:40:54 -07:00
Wenbing Li	aa2c82fa67	Add the MLlama Imaging Processing Support (#823 ) * initial checkins for mllama image process * fix some tests * some fixings * add more image * More test assertions * parity test passed * code clean up * code refinement	2024-10-22 14:24:09 -07:00
Sayan Shaw	7ab9d24cb4	Add general regex support (#822 ) * Add general regex support * add case 5 support instead of replacing with s+ * add more test cases * address comments * add back gpt2 and llama regex methods for efficiency --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>	2024-10-21 16:29:17 -07:00
Wenbing Li	1fb87a30f7	Validate the tokenizer class name on data loading (#830 )	2024-10-21 13:25:37 -07:00
Rony Fadel	8de0d6c8db	Change the framework bundle identifier to a valid one (#829 ) Ref: https://github.com/microsoft/onnxruntime-extensions/issues/825 "com.microsoft.onnxruntime_extensions" is not a valid identifier. Update it to "com.microsoft.onnxruntime-extensions"	2024-10-21 10:56:41 -07:00
Akshay Sonawane	944bad6036	bump version from 0.13.0 to 0.14.0 (#827 )	2024-10-17 11:55:58 -07:00
Wenbing Li	e19c0894ec	Fix CUDA CI build failures (#824 )	2024-10-11 16:08:44 -07:00
Wenbing Li	62c0a7bfda	fix the unigram detector for last HG tokenizer (#820 )	2024-10-03 14:25:53 -07:00
Stalin Sabu Thomas	f47bed4596	add(tutorials): exporting yolo world model (#803 ) * add(tutorials): exporting yolo world model This allows us to export yolo world onnx model which can be later used in mobile inference. * add(tutorial): make classes optional --------- Co-authored-by: Scott McKay <skottmckay@gmail.com>	2024-10-03 14:42:35 +10:00
Wenbing Li	12a9e8beb4	support sentence-piece add_dummy_prefix for all models (#819 ) * add compatibility docs continue updating the doc updating doc 2 * support sentence-piece add_dummy_prefix for all models * revert the flag * initialize the add_dummy_prefx for llama model	2024-10-01 09:08:59 -07:00
Wenbing Li	e710d80f71	Improve Documentation: Add Hugging Face Compatibility Docs and Refine the existing docs (#818 ) * add compatibility docs * continue updating the doc * updating doc 2 * revert the bpe changes	2024-09-30 13:04:33 -07:00
Wenbing Li	2c3e936cfc	support the merges array in tokenizer.json (#817 )	2024-09-26 11:01:13 -07:00
Chester Liu	e424838708	Added support for native image decoding (#808 ) This added support for native image decoding on Windows & Apple platforms. This helps us remove libpng & libjpeg completely on these platforms, and in the meantime support more image formats thanks to OS vendors,	2024-09-26 09:17:55 +08:00
Chester Liu	f90a04606b	Fix unused result warnings (#802 ) Fix several unused result warnings --------- Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>	2024-09-26 07:54:16 +08:00
Wenbing Li	f204a4c791	Add a decoder for Unigram tokenizer and unify some classes among tokenizers (#816 ) * rename and formalize the file names * add the decoder impl * fix a typo	2024-09-25 10:25:06 -07:00
Wenbing Li	6b94f4d7a5	Fix the Unicode code discrepency on CLIP model (#814 ) * refine the code structure * more fixing on unicode * fix the codepoint 304 * add the clip tokenizer data files abck	2024-09-23 16:49:24 -07:00
Wenbing Li	176c1d0138	Support the Unigram tokenizer kind from sentencepiece library (#811 ) * initial commit * Ugm vocab loaded is good * test passed * fixes unit test on win32 * finish the parity check * code refinement * code refinement for review	2024-09-19 15:46:13 -07:00
Sayan Shaw	0d5d19f67b	fix prefast warning (#809 ) Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>	2024-09-15 22:34:07 -07:00
Chester Liu	8d842d85e3	Rm zlib when linking ocos_operators (#807 )	2024-09-13 07:07:10 +08:00
Sayan Shaw	8bc8e43da1	Add C++ regex support for Llama3, Standard Library, and Custom Cases (#804 ) * add C++ standard library regex support for GPT2 case * reorder regex handling * try without STL * missing case * add llama3 regex support * add custom regex impl * change regex based on model * modify tests, add docs, and code cleanup * add regex test and const strings --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>	2024-09-10 23:17:49 -07:00
Scott McKay	9164f54e5d	Don't disable vision operators in a catalyst build. (#805 ) * Don't disable vision operators in a catalyst build. * Patch to exclude NSImage on Mac-catalyst as it's not supported.	2024-09-10 08:58:09 +10:00
Wenbing Li	90d8f33172	Revert "some data calc fixing" This reverts commit `dae9510dbb`.	2024-09-05 09:30:19 -07:00
Wenbing Li	dae9510dbb	some data calc fixing really split the images test with sus	2024-09-05 09:26:05 -07:00
Wenbing Li	1b80794903	Remove OpenCV dependency from C_API mode (#800 ) * Remove OpenCV dependency from C_API model * fix build on Windows * switch ci build flag * try to fix the macOS build issue * more fixing * fix the macOS build issue * list jpeg source * verified on MacOS * update the pp_api too * avoid the codecs library conflicts * Add the unit tests * move the codec test * add the missing dl lib for extensions test * refine the code * a smaller fixing for Windows Python	2024-09-04 16:50:05 -07:00
Kyle	7c3ce36af8	Add Files Signature Validation after Signed by ESRP (#801 ) * vlidate sign after ERSP * blank line * format	2024-09-02 17:17:03 +08:00
Wenbing Li	b8b2ebfb85	optimize spm tokenizer for long text (#799 ) * optimize spm tokenizer for long text * refine the split logic * re-trigger CI pipeline.	2024-08-30 14:58:40 -07:00
Prathik Rao	6f532376c9	bump (#791 ) Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>	2024-08-27 18:58:18 -07:00
Wenbing Li	2d02a687be	Optimize the tokenizer for efficiency (#797 ) * optimize the tokenizer for efficiency * fix the unit test failures. * fix the api test case failures * removed the unused code. * More test cases fixings * One more fixing * fix macOS build issues * refine the test * add more diagnosis info. * fix unit test in CI Linux * fix the pp_api test failure	2024-08-27 18:57:50 -07:00
Yi Zhang	2d044adbf9	sign with the correct key code (#796 ) Fixes incorrect dll singnature	2024-08-26 16:48:29 +08:00
Wenbing Li	8f2c35fad0	Add more tests for pre-processing C APIs (#793 ) * initial api for tokenizer * More fixings and test data refinement * add a simple wrapper for pre-processing APIs * fix the test issues * test if the tokenizer is spm based * fix the failed test cases * json pointer does not work	2024-08-21 16:48:39 -07:00
Zhipeng Han	85ffb94169	Update custom_ops.md (#795 ) add domain for SentencePiece Op	2024-08-21 09:52:54 -07:00
Wenbing Li	711a2cfa69	add a convert_token_string_to_an_id API for the prompt ids (#794 ) * add a convert token string to an id API for the prompt ids * fix the build issues on Linux	2024-08-19 16:44:07 -07:00
vraspar	6ce22f8ac4	Update nuget extraction path for iOS xcframework (#792 ) * Update nuget extraction path for iOS xcframework * Update nuget extraction path for iOS xcframework	2024-08-16 10:34:40 +10:00
vraspar	8b5354fb67	Update macosx framework packaging to follow apple guidelines (#776 ) * Update macosx framework packaging to follow apple guidelines * Test path fix * Update tools/ci_build/extract_nuget_files.ps1 ---------	2024-08-13 10:37:22 +10:00
Wenbing Li	be29e28dd7	support tokenizers build only in C API mode (#783 ) * support tokenizer build only in C API mode * fix the python build. * fix the selectedops build --------- Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>	2024-08-02 13:28:58 -07:00
Sayan Shaw	7851b51ee3	Add initial tiktoken and Phi3SmallTokenizer support (#729 ) * add initial tiktoken support * add vector hash and equal for bpe ranks map * change lambda comparator * move phi-3-small files * final changes * move tiktoken files from data2 to data * add unit test * add tokenizer module * merge json and tiktoken impl * fix tiktoken encoding problem * address comments * remove dummy tokens --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com> Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>	2024-08-02 10:24:02 -07:00
Wenbing Li	46998e96fb	Update build-package-for-windows.yml (#784 )	2024-08-01 14:45:26 -07:00
Wenbing Li	4bb63dd2aa	Upgrade ESRP signing task from v2 to v5 (#780 ) * Upgrade ESRP signing task from v2 to v5 * Upgrade ESRP signing task from v2 to v5 in win --------- Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>	2024-08-01 09:57:59 -07:00
Wenbing Li	8b002b86ab	Fix the case that bos_token is null (#781 )	2024-07-31 17:50:20 -07:00
Wenbing Li	b4ebfc9519	Fix spm converted FastTokenizer issue on non-ascii char (#778 ) * Fix spm converted tokenizer issue on non-ascii char * remove pkg_resource in python	2024-07-31 14:22:25 -07:00

1 2 3 4 5 ...

591 Коммитов Все ветки Поиск

591 Коммитов

Все ветки