onnxruntime-extensions

Граф коммитов

Автор	SHA1	Сообщение	Дата
Wenbing Li	5104bb9897	fix the win32 macro usage (#844 )	2024-11-15 11:26:37 -08:00
Wenbing Li	3da0d3c929	Load the tokenizer data from the memory (#836 )	2024-11-09 10:15:21 -08:00
Wenbing Li	be5aa773e3	Unify the image operations in extensions library (#831 ) * Unify the image operations in extensions library * fix the build configuration issue * More build fixings * Fix the native image codec * fix encode_image * Add bgr/rgb conversion for encoding image * parity check * build break * update PNG encoding parameters * build break on Linux * using MSE to compare images * fix the discrependency between Linux and Windows * final code refinement * one more change * fix the C++ warnings --------- Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>	2024-10-30 09:17:06 -07:00
Wenbing Li	aa2c82fa67	Add the MLlama Imaging Processing Support (#823 ) * initial checkins for mllama image process * fix some tests * some fixings * add more image * More test assertions * parity test passed * code clean up * code refinement	2024-10-22 14:24:09 -07:00
Sayan Shaw	7ab9d24cb4	Add general regex support (#822 ) * Add general regex support * add case 5 support instead of replacing with s+ * add more test cases * address comments * add back gpt2 and llama regex methods for efficiency --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>	2024-10-21 16:29:17 -07:00
Wenbing Li	1fb87a30f7	Validate the tokenizer class name on data loading (#830 )	2024-10-21 13:25:37 -07:00
Chester Liu	e424838708	Added support for native image decoding (#808 ) This added support for native image decoding on Windows & Apple platforms. This helps us remove libpng & libjpeg completely on these platforms, and in the meantime support more image formats thanks to OS vendors,	2024-09-26 09:17:55 +08:00
Wenbing Li	f204a4c791	Add a decoder for Unigram tokenizer and unify some classes among tokenizers (#816 ) * rename and formalize the file names * add the decoder impl * fix a typo	2024-09-25 10:25:06 -07:00
Wenbing Li	6b94f4d7a5	Fix the Unicode code discrepency on CLIP model (#814 ) * refine the code structure * more fixing on unicode * fix the codepoint 304 * add the clip tokenizer data files abck	2024-09-23 16:49:24 -07:00
Wenbing Li	176c1d0138	Support the Unigram tokenizer kind from sentencepiece library (#811 ) * initial commit * Ugm vocab loaded is good * test passed * fixes unit test on win32 * finish the parity check * code refinement * code refinement for review	2024-09-19 15:46:13 -07:00
Sayan Shaw	8bc8e43da1	Add C++ regex support for Llama3, Standard Library, and Custom Cases (#804 ) * add C++ standard library regex support for GPT2 case * reorder regex handling * try without STL * missing case * add llama3 regex support * add custom regex impl * change regex based on model * modify tests, add docs, and code cleanup * add regex test and const strings --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>	2024-09-10 23:17:49 -07:00
Wenbing Li	90d8f33172	Revert "some data calc fixing" This reverts commit `dae9510dbb`.	2024-09-05 09:30:19 -07:00
Wenbing Li	dae9510dbb	some data calc fixing really split the images test with sus	2024-09-05 09:26:05 -07:00
Wenbing Li	1b80794903	Remove OpenCV dependency from C_API mode (#800 ) * Remove OpenCV dependency from C_API model * fix build on Windows * switch ci build flag * try to fix the macOS build issue * more fixing * fix the macOS build issue * list jpeg source * verified on MacOS * update the pp_api too * avoid the codecs library conflicts * Add the unit tests * move the codec test * add the missing dl lib for extensions test * refine the code * a smaller fixing for Windows Python	2024-09-04 16:50:05 -07:00
Wenbing Li	2d02a687be	Optimize the tokenizer for efficiency (#797 ) * optimize the tokenizer for efficiency * fix the unit test failures. * fix the api test case failures * removed the unused code. * More test cases fixings * One more fixing * fix macOS build issues * refine the test * add more diagnosis info. * fix unit test in CI Linux * fix the pp_api test failure	2024-08-27 18:57:50 -07:00
Wenbing Li	8f2c35fad0	Add more tests for pre-processing C APIs (#793 ) * initial api for tokenizer * More fixings and test data refinement * add a simple wrapper for pre-processing APIs * fix the test issues * test if the tokenizer is spm based * fix the failed test cases * json pointer does not work	2024-08-21 16:48:39 -07:00
Wenbing Li	711a2cfa69	add a convert_token_string_to_an_id API for the prompt ids (#794 ) * add a convert token string to an id API for the prompt ids * fix the build issues on Linux	2024-08-19 16:44:07 -07:00
Wenbing Li	be29e28dd7	support tokenizers build only in C API mode (#783 ) * support tokenizer build only in C API mode * fix the python build. * fix the selectedops build --------- Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>	2024-08-02 13:28:58 -07:00
Sayan Shaw	7851b51ee3	Add initial tiktoken and Phi3SmallTokenizer support (#729 ) * add initial tiktoken support * add vector hash and equal for bpe ranks map * change lambda comparator * move phi-3-small files * final changes * move tiktoken files from data2 to data * add unit test * add tokenizer module * merge json and tiktoken impl * fix tiktoken encoding problem * address comments * remove dummy tokens --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com> Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>	2024-08-02 10:24:02 -07:00
Wenbing Li	b4ebfc9519	Fix spm converted FastTokenizer issue on non-ascii char (#778 ) * Fix spm converted tokenizer issue on non-ascii char * remove pkg_resource in python	2024-07-31 14:22:25 -07:00
Wenbing Li	c3145b8f52	add the decoder_prompt_id for whisper tokenizer (#775 ) * add the decoder_prompt_id for whisper tokenizer * temporarily disable android prebuilt * disable the prebuilt for android * disable the prebuilt for android 2 * Add a unit test * correct test ids	2024-07-29 14:21:17 -07:00
Wenbing Li	620050fbe0	reimplement resize cpu kernel for image processing (#768 ) * reimplement resize cpu kernel for image processing * accuracy fixing and code refinement * fix the build issues * fix Linux build issue * more fixings * Fix the pipeline issue * fix the ci script * try to fix CUDA machine pool	2024-07-23 15:40:52 -07:00
Wenbing Li	38a3d85f8f	switch cmake cmp0169 flag to new (#762 ) * switch cmake cmp0169 flag to new * the missing spm code. * more refinement on cmake build targets * Update ci.yml * Update ci.yml * update the jpg files after using libjpeg instead of libjpeg-turbo * exclude cutlass too * upgrade the protobuf library to be consistent with ORT * update the protoc generated files * use the right patch name * Update cutlass.cmake	2024-07-15 23:28:49 -07:00
Wenbing Li	8153bc1a3a	Feature extraction C API for whipser model (#755 ) * Feature extraction C API for whipser model * Update the docs * Update the docs2 * refine the code * fix some issues * fix the Linux build * fix more data consistency issue * More code refinements	2024-07-11 11:20:36 -07:00
Wenbing Li	b436d09459	Fix the CI pipeline for the latest PyTorch release. (#759 )	2024-07-08 16:21:48 -07:00
Wenbing Li	cbed8fd575	Add a generic image processor and its C API (#745 ) * Add a generic image processor * add more tests * Fix the test failures * Update runner.hpp	2024-06-20 10:53:49 -07:00
Xavier Dupré	bef5f07e33	Add custom ops ReplaceZero (#739 ) * Add custom ops ReplaceZero * fix merge conflicts	2024-06-18 11:36:14 +02:00
Xavier Dupré	690bed71b6	Add operator MulSigmoid, MulMulSigmoid (#741 ) * Add operator MulSigmoid * add mul mul sigmoid * add comments * Apply suggestions from code review --------- Co-authored-by: Wei-Sheng Chin <wechi@microsoft.com>	2024-06-12 10:29:42 +02:00
Xavier Dupré	f5055466d5	Add custom kernel ScatterNDOfShape (#705 ) * first draft * clang * Draft for ScatterNFOfShape * fix build * disable test when cuda is missing * fix implementation * update test * add MaskedScatterNdOfShape * fix merge conflicts	2024-06-11 09:59:46 +02:00
Xavier Dupré	79f3b048d4	Add custom op Transpose2DCast (#737 ) * Add custom op Transpose2DCast * fix compilation issues * fix compilation issues	2024-06-06 17:44:21 +02:00
Xavier Dupré	1e8c1211a5	Add custom kernels AddSharedInput, MulSharedInput (#734 ) * Add custom kernel AddSharedInput, MulSharedInput * fix compilation * compilation issue * fix unit test	2024-06-05 10:42:22 +02:00
Wenbing Li	ca433cbea7	Refactor the unit tests and cmake build script (#726 ) * refine the build script * complete the unit tests. * remove the commented code	2024-05-30 14:16:14 -07:00
Xavier Dupré	95a49faabe	Add kernel NegXPlus1 = 1 - X (#709 ) * first draft for NegXPlus1 * complete * fix unit test * rename one test * remove test if not cuda --------- Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>	2024-05-29 15:26:44 +02:00
Wenbing Li	474540d8a5	Fix the image processing output data discrepancy (#722 ) * some data calc fixing * Update image_transforms.hpp * really split the images * Update image_transforms.hpp	2024-05-20 12:44:48 -07:00
Tang, Cheng	f0ef40d074	add move constructor and Release API for tensor (#717 ) Co-authored-by: Cheng Tang <chenta@microsoft.com@onnxruntime-a10.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>	2024-05-17 11:50:20 -07:00
Wenbing Li	4781a9d1d8	Add ci pipeline for pre-processing API testing (#718 ) * Add ci pipeline for pre-processing API testing * update cmake for testing * add test cases back * add other two pipelines * fix macos pipeline	2024-05-16 15:39:52 -07:00
Wenbing Li	311dd35401	Add ImageProcessor for Multimodel model Pre-processing (#715 ) * only keep the image decoder from opencv * initial build * refine the code * Add clear functions * Update CMakeLists.txt * Update opencv.cmake * change the output type to float * get the result * align image-process with original Python * move the LoadRawImages into library * fix the calculation error * fix the pipeline build issue * fix the build breaks in ci pipeline * support json configuration file and refactor the code.	2024-05-15 14:35:14 -07:00
Wenbing Li	c58c930739	Ignore all streaming output of invalid utf-8 string (#704 ) * Ignore all streaming output of invalid utf-8 string * Update bpe_streaming.hpp * add the phi-3 tokenizer test * add a streaming test for phi-3 model * fix the utf-8 validation * fix the utf-8 validation 2 * fix the utf-8 validation 3 * fix the utf-8 validation 4	2024-05-06 16:46:55 -07:00
cao lei	dfdf52e759	refactor cuda ops, remove contrib folder (#707 ) Co-authored-by: Lei Cao <leca@microsoft.com@onnxruntime-a10.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>	2024-05-03 12:18:59 -07:00
Tang, Cheng	3b889fc42f	update custom op v2 struct to be able to invoke from eager mode (#700 ) Co-authored-by: Cheng Tang <chenta@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net> Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>	2024-04-30 13:53:39 -07:00
Wenbing Li	a8bce4328b	Add the tokenizer C ABI (#693 ) * initial checkins * fix the selectedops build failures * add the tokenization implementation * update the windows DEF file for c abi in cmake file * fix the build on linux * fix some warnings and remove the unused code * initial import of unit tests from tfmtok * add streaming API support * fix the merges loading issues * complete export from tfmtok - needs input id fixing * fix the unit test failures. * fix all unit test failure * refactor streaming code * remove the unused code --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>	2024-04-29 16:45:49 -07:00
Tang, Cheng	1f31d33ed4	Eager mode: cuda kernel support (#694 ) * add UT for neg_pos_cuda in eager mode and fix build break in Windows * fix Linux build break * adjust argument and path * remove old cudaContext * add ort cuda test back * fix cuda tests * undo debug code * undo useless change --------- Co-authored-by: jslhcl <jslhcl@gmail.com> Co-authored-by: Cheng Tang <chenta@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net> Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>	2024-04-24 12:49:00 -07:00
Wenbing Li	f9290e8bac	Add a status class for future tokenizer API implementation (#690 ) * Add a status class for future API implementation * Update bpe_kernels.cc * fix the ios package pipeline * update mistral test model name	2024-04-18 21:12:14 -07:00
Wenbing Li	646462790b	Refactor the header file directory and integrate the eager tensor implementation (#689 ) * refactor the header file in include folder * fix the basic-token eager unit test case * a more flexible way to handle string tensor shape. * fix the unit test path issue * remove the multi-inherits to avoid issue during pointer casting * add api cmake build support * undo some temporary changes * code refinement * fix variadic arg * only expose the context for ort version >= 17 * fix a shape bug * fix the cuda build issue * change ifdef condition of GetAllocator * finalize the ort c abi wrapper file name * fix the iOS build break * align gtest version with triton * Update ext_apple_framework.cmake for iOS header files --------- Co-authored-by: Cheng Tang <chenta@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>	2024-04-17 12:58:19 -07:00
Wenbing Li	6ac6fb6fbd	using the huggingface whisper config instead of fixed numbers (#667 ) * using the huggingface whisper config instead of fixed numbers * refactor a little bit	2024-03-06 14:29:49 -08:00
Wenbing Li	61369fb970	Unify the spm/bpe tokenizers (#666 ) * Unify the spm/bpe tokenizers * fix the build error * fix the decoding issue * add model name in exported onnx * fixing the unit tests * revert the unneccesary file format changes	2024-03-06 10:07:05 -08:00
Wenbing Li	69a08ffb1d	Remove numpy dependency from its Python binary build (#657 )	2024-02-21 09:54:17 -08:00
Sayan Shaw	a03eded71e	Add initial CUDA native UT (#625 ) * Add initial CUDA native UT * fix the build issue * fix other build error * add 30 mins to android packaging pipeline timeout due to early timing out * undo android pipeline timeout change - move to other PR * revert ifdef for testing ci * add if def for cuda * update ci ORT linux package name * update the package extraction path * Update ci.yml * Update ci.yml --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com> Co-authored-by: Wenbing Li <wenbingl@outlook.com> Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>	2024-01-13 15:34:16 -08:00
Wenbing Li	a32b932547	add a gen_processing_model option to cast token-id for int64 (#632 ) * add a gen_processing_model option to cast token-id for int64 * Update util.py test pipeline trigger	2024-01-12 10:15:18 -08:00
Rachel Guo	fcee38ff68	Add macos platform suppport to onnxruntime-extensions-c pod (#622 ) * Squashed commit of the following: commit 0bd8a9bd49b2bddae3aa0e6c61406e3fb20e011d Author: rachguo <rachguo@rachguos-Mac-mini.local> Date: Thu Dec 14 16:55:29 2023 -0800 remove #Preview commit ac2ecdc696d06d579594834a0ffcc01613bd3422 Author: rachguo <rachguo@rachguos-Mac-mini.local> Date: Thu Dec 14 15:29:36 2023 -0800 fix podfile commit 24bb619fb311f64e28fe3bc94c44912d261ec0bc Author: rachguo <rachguo@rachguos-Mac-mini.local> Date: Thu Dec 14 15:27:57 2023 -0800 use pre-release version pod now commit 9e227da06fe29ba01aef1d39a40712fd5dfd9dfc Author: rachguo <rachguo@rachguos-Mac-mini.local> Date: Thu Dec 14 14:09:41 2023 -0800 update sed commit 6b9651d4d540845af441bc6cf1d45e1561ec967e Author: rachguo <rachguo@rachguos-Mac-mini.local> Date: Thu Dec 14 13:14:46 2023 -0800 minor fix commit 26472d072e2147cd5d92fd6e02dfa182722e109f Author: rachguo <rachguo@rachguos-Mac-mini.local> Date: Thu Dec 14 12:08:42 2023 -0800 fix pod arch path commit ba0237e3dd83bed706060969f4bd206ede68fecf Author: rachguo <rachguo@rachguos-Mac-mini.local> Date: Thu Dec 14 11:13:51 2023 -0800 update yml files commit 1d91e17743594c28d3030089afae2578daaff848 Author: rachguo <rachguo@rachguos-Mac-mini.local> Date: Thu Dec 14 10:25:24 2023 -0800 add script to substitute podspec file source commit 248effa32e08cf08c6268ba8ac81ce9bec2b940d Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Thu Dec 14 07:33:21 2023 -0800 fix pod and update artifacts path commit 7dfed33706f9e8772126eb78f551cc2110011e64 Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Thu Dec 14 01:07:43 2023 -0800 update commit 834b03fa69faebc2c7cd948287f870a3f83304a6 Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Thu Dec 14 00:07:04 2023 -0800 update directory name commit ac46342bb65d4b670b90c4685d6b0d47273edeb5 Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Wed Dec 13 23:17:28 2023 -0800 format commit 1a10611b28e16cf05e9c91b19eff600e401dde84 Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Wed Dec 13 23:16:24 2023 -0800 copyrights comments and fix .yml format commit 431682ef154ab93e68a6d099e0604d3a0d7fd804 Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Wed Dec 13 23:05:39 2023 -0800 add macos testing target in the app and testing ci updates commit dcd0f302b3f0101584a16b91ef5a81559b22cb5a Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Wed Dec 13 14:17:28 2023 -0800 update opencv.cmake again commit 28b083c5d39fa743101b30e513f28bef7a82f24b Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Wed Dec 13 11:59:59 2023 -0800 minor fix commit d80acdad8583217ec06013f270732df2d8db62b5 Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Wed Dec 13 11:26:49 2023 -0800 add zlib to build from source option and minor update commit dfd37effec13806ce30ddc5ed76dacefdfbc13f2 Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Tue Dec 12 19:48:40 2023 -0800 update podspec.template file commit b227c2c196216aef6a05ba58254dedd1bcdcac60 Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Tue Dec 12 19:26:40 2023 -0800 comment out lint pod for now commit d4bd488006e9d0b25ee01cb7c8447ec2bda620bd Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Tue Dec 12 18:46:38 2023 -0800 fix podspec.template commit a477470a3e63b5dd1966d8a943696df887859dd8 Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Tue Dec 12 15:45:04 2023 -0800 minor update commit a07299decdfc10e7c2e96bd77ee49f97afbe5bd4 Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Tue Dec 12 11:28:51 2023 -0800 clean commit a83642fbe309bd3c23f93c7aa00801f37cd8a0a3 Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Tue Dec 12 11:26:50 2023 -0800 fix merging framework_info.json process commit 02980feff9a28a3099906c66a11df1f1d1ecf071 Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Tue Dec 12 10:22:42 2023 -0800 add step for checking the framework_info.json file contents commit ee224e9e5948a6484dff8378697a71cae07e0801 Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Tue Dec 12 09:35:32 2023 -0800 update to xcframework_info.json commit 96e13627c2f3c9802d90dd5afe4d96b69d76e012 Author: rachguo <rachguo@rachguos-Mini.attlocal.net> Date: Tue Dec 12 01:06:36 2023 -0800 add changes for macosx build for extensions pod * address pr comments * add back supported archs * update build.py * reorganize source code avoid duplicates * add minor note * exclude macos for ci.yml * update ci.yml * address pr comments * update * update * Update tools/ios/assemble_pod_package.py Co-authored-by: Scott McKay <skottmckay@gmail.com> --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-12-19 18:26:12 -08:00

1 2 3 4 5

227 Коммитов