onnxruntime-extensions

Граф коммитов

Автор	SHA1	Сообщение	Дата
Wenbing Li	396044310e	Add more HF tokenizer supports in gen_processing_models (#531 )	2023-08-18 17:09:22 -07:00
Wenbing Li	ee14fbe48e	correct CLIP tokenizer name (#526 )	2023-08-16 12:51:17 -07:00
Sayan Shaw	9ba649e134	Fix HF Fast Tokenizer cvt issue for AutoTokenizer imp (#520 ) * Fix GPT2 and Falcon tokenizer cvt for AutoTokenizer imp * fix fast tokenizer issue * small fix * use slow tokenizer in test script --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>	2023-08-11 13:17:56 -07:00
Wenbing Li	978ada6d60	Add TrieTokenizer for RWKV-like LLM models (#509 ) * Add TrieTokenizer for RWKV-like LLM models * add more tests * fix the windows build * downloading file instead of check in the vocab file * a small bug fixing	2023-08-08 16:47:38 -07:00
Sayan Shaw	997e9ee007	Add Falcon-7b and Falcon-40b tokenizer support (#510 ) * Add Falcon-7b and Falcon-40b tokenizer support * fix alignment and add tokenizer file in test/data to speed up compute --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>	2023-08-07 14:37:57 -07:00
Wenbing Li	922b7cc387	Add Bert tokenizer in the supported model list and code refinement (#503 ) * Add Bert tokenizer in the supported model list and the related code refinement * utest fix	2023-08-02 14:01:36 -07:00
Wenbing Li	b8bac85ecd	Add Llama and Llama 2 tokenization supports (#499 )	2023-07-26 10:22:00 -07:00
Wenbing Li	62d8598b6b	Update whisper model test cases and e2e example (#496 ) * Update whisper model test cases and e2e example * fix unit test on windows * more refinement * utest fix	2023-07-21 15:27:02 -07:00
Wenbing Li	981cb049ff	Add a new API for building data processing graph from Huggingface transformers processor/tokenizer (#482 ) * initial checkins * test pass * basic impl * first unit test pass * merge error * refine a little bit * add more unit test * fix unit test * Fix the unit test. * add one more whisper audiodecoder test case * update the docs * More updates	2023-07-17 16:50:58 -07:00
JiCheng	5d480a8c5d	clip_image_processor (#478 ) * clip_image_processor separate clip ppp --------- Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-07-12 17:52:17 +08:00
Sayan Shaw	d876f7ff82	Initial BertTokenizer offset mapping implementation (#477 ) * Initial BertTokenizer offset mapping implementation * minor change --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>	2023-07-03 15:17:23 -07:00
Wenbing Li	93f239c143	Unit test being compatible with ONNXRuntime-GPU package, and some clean-ups. (#457 )	2023-05-30 11:01:30 -07:00
Scott McKay	64f20828ce	Handle ONNX 1.14 in test scripts (#435 ) * Calculate and specify ir_version so we use the oldest possible for maximum compatibility * Don't use `ignore_unknown` in call to `find_min_ir_version_for` as it's only supported in the most recent ONNX release.	2023-05-12 07:13:37 +10:00
Vishal Jain	03b96c822c	Fix ReadMe : Example usage of the PrePostProcessor.md (#436 ) - Small typo fix in "Add post-processing steps"	2023-05-11 18:36:14 +10:00
Wenbing Li	43994eb34a	Fix the unit test failure with ONNX 1.14 package. (#428 ) * Fix the unit test failure with ONNX 1.14 package. * more tests * Update whisper_e2e.py	2023-05-08 11:37:54 -07:00
Wenbing Li	46efcb9051	PyOp attribute supports int and float data type (#425 )	2023-05-05 19:35:59 -07:00
Wenbing Li	2fa0b710ea	Adding down-sampling and stereo mixing features for AudioDecoder (#420 ) * initial draft * second * third * polishing * fix the M_PI name in LINUX platform * fix bessel function issue * add a unit test case * fix the unit test name	2023-05-04 13:30:10 -07:00
Wenbing Li	0f45fef2d9	Compatible with onnxruntime-gpu package (#410 ) * be compatible without onnxruntime-gpu version * some fixing	2023-04-26 17:17:23 -07:00
Wenbing Li	997fa892c2	more code fixing related whisper models (#403 )	2023-04-21 09:26:44 -07:00
JiCheng	db87dc416d	[object detection ppp] YoLo as example (#397 ) * object detection * Unit test add e2e fastestdet model test --------- Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-04-20 13:34:11 +08:00
Wenbing Li	adb8efd62b	support batch > 1 in BpeDecoder (#400 ) * support batch > 1 in BpeDecoder * update the shape in helper function	2023-04-19 14:28:56 -07:00
Wenbing Li	711774db6b	Add a merge step in whisper end-to-end script and fixed some issues (#399 ) * add merged models in whisper model * verify the final model	2023-04-17 16:37:06 -07:00
JiCheng	154ead35a3	built-in bounding box op (#382 ) * built-in bounding box op * update boundary check * assert policy * more boundary test and check * XYXY--> X horizon --------- Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-04-12 19:35:53 +08:00
Wenbing Li	b5dce955f0	Add an audio decoder custom op for whisper end-to-end processing (#385 ) * evaluate the audio decoder library * MP3 Decoder * rename it to test_audio_codec * add the audio decoder to whisper model * whisper end-to-end draft * fix the mp3 decoder * Running with ONNX models * Add more audio format supports * refine the end-to-end script * Update operators/audio/audio_decoder.hpp Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> * Update operators/audio/audio_decoder.hpp Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> * Update operators/audio/audio_decoder.hpp Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> * some fixings of comments and more test cases. * changes for review comments. * Update audio_decoder.hpp * Update audio_decoder.hpp * code refinement * Update operators/audio/audio_decoder.hpp Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> --------- Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-04-11 14:47:10 -07:00
Wenbing Li	9cd1284da8	Pre and Post processing example for openAI Whisper model (#380 ) * add a stft-norm custom op for log-mel spectrum. * undo the debug change * Support ONNX standard STFT op signature. * Add a unit test onnx STFT compatible mode. * add whisper pre-/post- processing example * Update dlib.cmake * undo test code changes * Update setup.cfg * update the end2end example with STFT op	2023-03-30 13:44:50 -07:00
Sayan Shaw	8b2af20b46	Update CLIPTokenizer cvt for added offset mapping output (#384 ) Authored-by: Sayan Shaw <sayanshaw@microsoft.com>	2023-03-23 23:52:58 -07:00
Sayan Shaw	b3420f9ca3	Added CLIPTokenizer to _cuops.py and corresponding cvt func and test (#379 ) Authored-by: Sayan Shaw <sayanshaw@microsoft.com>	2023-03-15 15:45:59 -07:00
Sayan Shaw	29f55ce400	Added cvt function for RobertaTokenizer (#378 ) * Added roberta converter * Added roberta to _cuops and added cvt test --------- Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>	2023-03-14 15:45:31 -07:00
Wenbing Li	3b0bd66e9e	Add a bbpe tokenizer decoder for Whisper model (#376 ) * initial PR * add the attributes for op * cmake update * add the missing symbol * add a unit test case * fix the unit test * fix some corner case. * format Python code with autopep8	2023-03-08 15:00:01 -08:00
JiCheng	b375cb57e6	support mobilebert_ppp (#354 ) * support mobilebert_ppp * renaming IOEntryValuePreserver * generalize argmax step --------- Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>	2023-02-27 18:53:37 +08:00
Scott McKay	91d75f460b	Update tutorial and example usage to provide info on installing the nightly (#364 )	2023-02-17 19:27:00 -08:00
Scott McKay	f3654e5bac	Fix opset 18 issues and bug due to ORT Resize issue (#362 ) * - Fix Split(18) requiring num_outputs. - Calculate `sizes` in Resize instead of using the simpler `scales` - ORT implementation does not round correctly when applying scales - Update center crop to use float so we are more accurate in choosing the crop area. - Fix minor issue with Debug step by only adding values that are altered to the renaming graph inputs. - Update unit tests expected output due to the change in Resize using sizes instead of scales. - Crop e2e example input so before/after image covers same area. * Simplify. CenteredCrop doesn't need to use float as it's dividing by 2 (so using float + floor gives the same result). Remove Resize impl using scales - we most likely will never go back to it. Address PR comments Update doc	2023-02-18 06:37:05 +10:00
Scott McKay	cd5ea11aaa	Move the pre/post processing scripts into the python module. (#349 ) * Move the pre/post processing scripts into the python module. Update usage/examples. * Use better version parsing. * Update tests, docs, * Address PR comments. Remove global Settings and pass onnx opset around directly where needed. Make PrePostProcessor the owner of the checker context.	2023-01-26 08:30:21 +10:00
Wenbing Li	67c77d9fbc	align python package version with version.txt (#345 ) * align python package version with version.txt * Update setup.py Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> * remove a line Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-01-12 14:28:32 -08:00
Wenbing Li	fec9af97aa	a naive decoder for sentencepiece tokenization (#314 ) * a naive decoder for sentencepiece tokenization * typo fixing * add a unit test for the decoder	2022-11-21 11:10:30 -08:00
Sayan Shaw	4683276158	Fixed attribute not found issues for hf_bert_tokenizer (#311 ) * Fixed do_lower_case attribute not found issue * Added check for strip_accents * Fixed typo * Changed strip_accents handling Co-authored-by: sayanshaw <sayanshaw@microsoft.com>	2022-11-01 13:42:10 -07:00
Wenbing Li	08659eae90	Initial Java API for the JAR package. (#292 ) * more C++ code fixing and polish for release * fixing for android build * build flags for android release * add missing exporting function * imint * first versoin * more C++ code fixing and polish for release (#275) * more C++ code fixing and polish for release * fixing for android build * build flags for android release * add missing exporting function * support build_id on Python package building (#281) * support buildid in package building * undo the change on build.sh * build.sh issue on macos * Add `$schema` to `cgmanifest.json` (#284) Co-authored-by: Jamie Magee <jamie.magee@microsoft.com> * test package with a simple java app * demo app * some fixing for windows platform * refine the example app * fix the missing symobls issue for Linux build * fix the package package build issue * typo * a missing change * fix PythonOp * fix Android test issue * one more Android change * replace build flags in ci pipeline * android AAR package build * refine the code for android package Co-authored-by: Jamie Magee <jamie.magee@gmail.com> Co-authored-by: Jamie Magee <jamie.magee@microsoft.com>	2022-10-04 16:22:28 -07:00
shaahji	78d8dd5705	OpenCV Image Decoder & SuperResolution CustomOps	2022-09-30 12:08:38 -07:00
Wenbing Li	134f882e64	more C++ code fixing and polish for release (#275 ) * more C++ code fixing and polish for release * fixing for android build * build flags for android release * add missing exporting function	2022-08-04 10:13:17 -07:00
Wenbing Li	5320af1eea	Fix the code security issue and 0.5 C++ release preparation. (#274 ) * Fix the code security issue and 0.5 C++ release preparation. * more fixings * vswhere	2022-08-02 10:09:35 -07:00
shaahji	0616039115	Issue #226 : Functional e2e NLP example * Implemented a new version of Kernel and the CustomOp to support output that matches the HuggingFace model's input without the need for intermediate python logic. * Implemented a e2e tutorial for exporting and inferencing using the HuggingFace's QuestionAnsering model. Known Issue: Python side doesn't have an implementation of Bert Decoder and so the augmented model is only half-complete. At the time of inferencing the HuggingFace tokenizer is used to decode the result back to string.	2022-07-22 13:56:40 -07:00
shaahji	3b2409d880	Issue #230 : Fix argument handling in BertTokenizer Fixed argument handling in pnp where the arguments weren't being passed down to the tokenizer as expected.	2022-07-21 00:00:09 -07:00
shaahji	8c3713194b	Issue #243 : Cannot rename input and output names of generated model When the input is a string, the logic takes a different route where the input model is split into two and joined again. The user provided input/output names were not respected on this code path. Fixed the issue by renaming the input/output post join operation.	2022-07-19 12:11:33 -07:00
Wenbing Li	e0952e7f2b	update the ci pipeline due to ONNX package upgrading (#256 ) * update the ci pipeline due to ONNX package upgrading * no 3.10 onnxruntime package	2022-06-27 15:04:27 -07:00
Wenbing Li	292a0297b4	reformat test code and verify the pipeline (#251 ) * reformat test code and verify the pipeline * upgrade googletest version * fix the merge issue * more formating	2022-06-20 12:38:06 -07:00
Wenbing Li	1a04abdf3e	Add two opencv operators as ONNX custom ops. (#249 ) * Add two opencv operators as ONNX custom ops. * update the git apply command line * adjust the difference threshold * do not break the build on binskim issue * Make ImageReader be optional * try to fix some potential build break * undo the debug flag in setup.cfg	2022-06-15 23:22:10 -07:00
Wenbing Li	da4784a2cc	update the bert end to end example with hftok (#236 )	2022-06-01 10:41:42 -07:00
shaahji	49548f843d	Issue #230 : Add HuggingFace vocab format to Bert tokenizer HuggingFace vocab format is newline separated (unlike GPT which is json). Newline separated is likely to be faster and doesn't require an external library to parse it. Instead of introducing a json based format, added support for native HuggingFace newline separated token format.	2022-05-26 14:17:20 -07:00
Wenbing Li	909acb7ce4	build and packaging script improvement for release (#218 ) * integrate opencv * small fixing * Add the opencv includes and libs * refine a little bit * standardize the output folder. * fix ctest on Linux * fix setup.py on output folder change. * more fixings for CI pipeline * more fixing 1 * more fixing 2 * more fixing 3 * ci pipeline fixing 1 * ci pipeline fixing 2 * a silly typo... * ci pipeline fixing 3 * fixing the file copy issue. * last fixing. * re-test the fullpath in build_ext. * One more try * extent timeout * mshost.yml indent * Update mshost.yaml for Azure Pipelines * cibuild build python versions * Update wheels.yml * only build python 3.8/3.9 * Update wheels.yml for Azure Pipelines * seperate the ci pipeline	2022-05-11 16:51:59 -07:00
Wenbing Li	bfbfa5a304	An end-to-end BERT model with pre-/post- processing. (#224 ) * bert demo * add some comments * support multiple outputs in ONNX model * code polishing * encoding issue on Windows platform.	2022-04-20 16:14:46 -07:00

1 2

85 Коммитов