Граф коммитов

227 Коммитов

Автор SHA1 Сообщение Дата
Wenbing Li 5104bb9897
fix the win32 macro usage (#844) 2024-11-15 11:26:37 -08:00
Wenbing Li 3da0d3c929
Load the tokenizer data from the memory (#836) 2024-11-09 10:15:21 -08:00
Wenbing Li be5aa773e3
Unify the image operations in extensions library (#831)
* Unify the image operations in extensions library

* fix the build configuration issue

* More build fixings

* Fix the native image codec

* fix encode_image

* Add bgr/rgb conversion for encoding image

* parity check

* build break

* update PNG encoding parameters

* build break on Linux

* using MSE to compare images

* fix the discrependency between Linux and Windows

* final code refinement

* one more change

* fix the C++ warnings

---------

Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
2024-10-30 09:17:06 -07:00
Wenbing Li aa2c82fa67
Add the MLlama Imaging Processing Support (#823)
* initial checkins for mllama image process

* fix some tests

* some fixings

* add more image

* More test assertions

* parity test passed

* code clean up

* code refinement
2024-10-22 14:24:09 -07:00
Sayan Shaw 7ab9d24cb4
Add general regex support (#822)
* Add general regex support

* add case 5 support instead of replacing with s+

* add more test cases

* address comments

* add back gpt2 and llama regex methods for efficiency

---------

Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
2024-10-21 16:29:17 -07:00
Wenbing Li 1fb87a30f7
Validate the tokenizer class name on data loading (#830) 2024-10-21 13:25:37 -07:00
Chester Liu e424838708
Added support for native image decoding (#808)
This added support for native image decoding on Windows & Apple platforms.
This helps us remove libpng & libjpeg completely on these platforms, and
in the meantime support more image formats thanks to OS vendors,
2024-09-26 09:17:55 +08:00
Wenbing Li f204a4c791
Add a decoder for Unigram tokenizer and unify some classes among tokenizers (#816)
* rename and formalize the file names

* add the decoder impl

* fix a typo
2024-09-25 10:25:06 -07:00
Wenbing Li 6b94f4d7a5
Fix the Unicode code discrepency on CLIP model (#814)
* refine the code structure

* more fixing on unicode

* fix the codepoint 304

* add the clip tokenizer data files abck
2024-09-23 16:49:24 -07:00
Wenbing Li 176c1d0138
Support the Unigram tokenizer kind from sentencepiece library (#811)
* initial commit

* Ugm vocab loaded is good

* test passed

* fixes unit test on win32

* finish the parity check

* code refinement

* code refinement for review
2024-09-19 15:46:13 -07:00
Sayan Shaw 8bc8e43da1
Add C++ regex support for Llama3, Standard Library, and Custom Cases (#804)
* add C++ standard library regex support for GPT2 case

* reorder regex handling

* try without STL

* missing case

* add llama3 regex support

* add custom regex impl

* change regex based on model

* modify tests, add docs, and code cleanup

* add regex test and const strings

---------

Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
2024-09-10 23:17:49 -07:00
Wenbing Li 90d8f33172 Revert "some data calc fixing"
This reverts commit dae9510dbb.
2024-09-05 09:30:19 -07:00
Wenbing Li dae9510dbb some data calc fixing
really split the images

test with sus
2024-09-05 09:26:05 -07:00
Wenbing Li 1b80794903
Remove OpenCV dependency from C_API mode (#800)
* Remove OpenCV dependency from C_API model

* fix build on Windows

* switch ci build flag

* try to fix the macOS build issue

* more fixing

* fix the macOS build issue

* list jpeg source

* verified on MacOS

* update the pp_api too

* avoid the codecs library conflicts

* Add the unit tests

* move the codec test

* add the missing dl lib for extensions test

* refine the code

* a smaller fixing for Windows Python
2024-09-04 16:50:05 -07:00
Wenbing Li 2d02a687be
Optimize the tokenizer for efficiency (#797)
* optimize the tokenizer for efficiency

* fix the unit test failures.

* fix the api test case failures

* removed the unused code.

* More test cases fixings

* One more fixing

* fix macOS build issues

* refine the test

* add more diagnosis info.

* fix unit test in CI Linux

* fix the pp_api test failure
2024-08-27 18:57:50 -07:00
Wenbing Li 8f2c35fad0
Add more tests for pre-processing C APIs (#793)
* initial api for tokenizer

* More fixings and test data refinement

* add a simple wrapper for pre-processing APIs

* fix the test issues

* test if the tokenizer is spm based

* fix the failed test cases

* json pointer does not work
2024-08-21 16:48:39 -07:00
Wenbing Li 711a2cfa69
add a convert_token_string_to_an_id API for the prompt ids (#794)
* add a convert token string to an id API for the prompt ids

* fix the build issues on Linux
2024-08-19 16:44:07 -07:00
Wenbing Li be29e28dd7
support tokenizers build only in C API mode (#783)
* support tokenizer build only in C API mode

* fix the python build.

* fix the selectedops build

---------

Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
2024-08-02 13:28:58 -07:00
Sayan Shaw 7851b51ee3
Add initial tiktoken and Phi3SmallTokenizer support (#729)
* add initial tiktoken support

* add vector hash and equal for bpe ranks map

* change lambda comparator

* move phi-3-small files

* final changes

* move tiktoken files from data2 to data

* add unit test

* add tokenizer module

* merge json and tiktoken impl

* fix tiktoken encoding problem

* address comments

* remove dummy tokens

---------

Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2024-08-02 10:24:02 -07:00
Wenbing Li b4ebfc9519
Fix spm converted FastTokenizer issue on non-ascii char (#778)
* Fix spm converted tokenizer issue on non-ascii char

* remove pkg_resource in python
2024-07-31 14:22:25 -07:00
Wenbing Li c3145b8f52
add the decoder_prompt_id for whisper tokenizer (#775)
* add the decoder_prompt_id for whisper tokenizer

* temporarily disable android prebuilt

* disable the prebuilt for android

* disable the prebuilt for android 2

* Add a unit test

* correct test ids
2024-07-29 14:21:17 -07:00
Wenbing Li 620050fbe0
reimplement resize cpu kernel for image processing (#768)
* reimplement resize cpu kernel for image processing

* accuracy fixing and code refinement

* fix the build issues

* fix Linux build issue

* more fixings

* Fix the pipeline issue

* fix the ci script

* try to fix CUDA machine pool
2024-07-23 15:40:52 -07:00
Wenbing Li 38a3d85f8f
switch cmake cmp0169 flag to new (#762)
* switch cmake cmp0169 flag to new

* the missing spm code.

* more refinement on cmake build targets

* Update ci.yml

* Update ci.yml

* update the jpg files after using libjpeg instead of libjpeg-turbo

* exclude cutlass too

* upgrade the protobuf library to be consistent with ORT

* update the protoc generated files

* use the right patch name

* Update cutlass.cmake
2024-07-15 23:28:49 -07:00
Wenbing Li 8153bc1a3a
Feature extraction C API for whipser model (#755)
* Feature extraction C API for whipser model

* Update the docs

* Update the docs2

* refine the code

* fix some issues

* fix the Linux build

* fix more data consistency issue

* More code refinements
2024-07-11 11:20:36 -07:00
Wenbing Li b436d09459
Fix the CI pipeline for the latest PyTorch release. (#759) 2024-07-08 16:21:48 -07:00
Wenbing Li cbed8fd575
Add a generic image processor and its C API (#745)
* Add a generic image processor

* add more tests

* Fix the test failures

* Update runner.hpp
2024-06-20 10:53:49 -07:00
Xavier Dupré bef5f07e33
Add custom ops ReplaceZero (#739)
* Add custom ops ReplaceZero

* fix merge conflicts
2024-06-18 11:36:14 +02:00
Xavier Dupré 690bed71b6
Add operator MulSigmoid, MulMulSigmoid (#741)
* Add operator MulSigmoid

* add mul mul sigmoid

* add comments

* Apply suggestions from code review

---------

Co-authored-by: Wei-Sheng Chin <wechi@microsoft.com>
2024-06-12 10:29:42 +02:00
Xavier Dupré f5055466d5
Add custom kernel ScatterNDOfShape (#705)
* first draft

* clang

* Draft for ScatterNFOfShape

* fix build

* disable test when cuda is missing

* fix implementation

* update test

* add MaskedScatterNdOfShape

* fix merge conflicts
2024-06-11 09:59:46 +02:00
Xavier Dupré 79f3b048d4
Add custom op Transpose2DCast (#737)
* Add custom op Transpose2DCast

* fix compilation issues

* fix compilation issues
2024-06-06 17:44:21 +02:00
Xavier Dupré 1e8c1211a5
Add custom kernels AddSharedInput, MulSharedInput (#734)
* Add custom kernel AddSharedInput, MulSharedInput

* fix compilation

* compilation issue

* fix unit test
2024-06-05 10:42:22 +02:00
Wenbing Li ca433cbea7
Refactor the unit tests and cmake build script (#726)
* refine the build script

* complete the unit tests.

* remove the commented code
2024-05-30 14:16:14 -07:00
Xavier Dupré 95a49faabe
Add kernel NegXPlus1 = 1 - X (#709)
* first draft for NegXPlus1

* complete

* fix unit test

* rename one test

* remove test if not cuda

---------

Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2024-05-29 15:26:44 +02:00
Wenbing Li 474540d8a5
Fix the image processing output data discrepancy (#722)
* some data calc fixing

* Update image_transforms.hpp

* really split the images

* Update image_transforms.hpp
2024-05-20 12:44:48 -07:00
Tang, Cheng f0ef40d074
add move constructor and Release API for tensor (#717)
Co-authored-by: Cheng Tang <chenta@microsoft.com@onnxruntime-a10.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>
2024-05-17 11:50:20 -07:00
Wenbing Li 4781a9d1d8
Add ci pipeline for pre-processing API testing (#718)
* Add ci pipeline for pre-processing API testing

* update cmake for testing

* add test cases back

* add other two pipelines

* fix macos pipeline
2024-05-16 15:39:52 -07:00
Wenbing Li 311dd35401
Add ImageProcessor for Multimodel model Pre-processing (#715)
* only keep the image decoder from opencv

* initial build

* refine the code

* Add clear functions

* Update CMakeLists.txt

* Update opencv.cmake

* change the output type to float

* get the result

* align image-process with original Python

* move the LoadRawImages into library

* fix the calculation error

* fix the pipeline build issue

* fix the build breaks in ci pipeline

* support json configuration file and refactor the code.
2024-05-15 14:35:14 -07:00
Wenbing Li c58c930739
Ignore all streaming output of invalid utf-8 string (#704)
* Ignore all streaming output of invalid utf-8 string

* Update bpe_streaming.hpp

* add the phi-3 tokenizer test

* add a streaming test for phi-3 model

* fix the utf-8 validation

* fix the utf-8 validation 2

* fix the utf-8 validation 3

* fix the utf-8 validation 4
2024-05-06 16:46:55 -07:00
cao lei dfdf52e759
refactor cuda ops, remove contrib folder (#707)
Co-authored-by: Lei Cao <leca@microsoft.com@onnxruntime-a10.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>
2024-05-03 12:18:59 -07:00
Tang, Cheng 3b889fc42f
update custom op v2 struct to be able to invoke from eager mode (#700)
Co-authored-by: Cheng Tang <chenta@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2024-04-30 13:53:39 -07:00
Wenbing Li a8bce4328b
Add the tokenizer C ABI (#693)
* initial checkins

* fix the selectedops build failures

* add the tokenization implementation

* update the windows DEF file for c abi in cmake file

* fix the build on linux

* fix some warnings and remove the unused code

* initial import of unit tests from tfmtok

* add streaming API support

* fix the merges loading issues

* complete export from tfmtok - needs input id fixing

* fix the unit test failures.

* fix all unit test failure

* refactor streaming code

* remove the unused code

---------

Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
2024-04-29 16:45:49 -07:00
Tang, Cheng 1f31d33ed4
Eager mode: cuda kernel support (#694)
* add UT for neg_pos_cuda in eager mode and fix build break in Windows

* fix Linux build break

* adjust argument and path

* remove old cudaContext

* add ort cuda test back

* fix cuda tests

* undo debug code

* undo useless change

---------

Co-authored-by: jslhcl <jslhcl@gmail.com>
Co-authored-by: Cheng Tang <chenta@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
2024-04-24 12:49:00 -07:00
Wenbing Li f9290e8bac
Add a status class for future tokenizer API implementation (#690)
* Add a status class for future API implementation

* Update bpe_kernels.cc

* fix the ios package pipeline

* update mistral test model name
2024-04-18 21:12:14 -07:00
Wenbing Li 646462790b
Refactor the header file directory and integrate the eager tensor implementation (#689)
* refactor the header file in include folder

* fix the basic-token eager unit test case

* a more flexible way to handle string tensor shape.

* fix the unit test path issue

* remove the multi-inherits to avoid issue during pointer casting

* add api cmake build support

* undo some temporary changes

* code refinement

* fix variadic arg

* only expose the context for ort version >= 17

* fix a shape bug

* fix the cuda build issue

* change ifdef condition of GetAllocator

* finalize the ort c abi wrapper file name

* fix the iOS build break

* align gtest version with triton

* Update ext_apple_framework.cmake for iOS header files

---------

Co-authored-by: Cheng Tang <chenta@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
2024-04-17 12:58:19 -07:00
Wenbing Li 6ac6fb6fbd
using the huggingface whisper config instead of fixed numbers (#667)
* using the huggingface whisper config instead of fixed numbers

* refactor a little bit
2024-03-06 14:29:49 -08:00
Wenbing Li 61369fb970
Unify the spm/bpe tokenizers (#666)
* Unify the spm/bpe tokenizers

* fix the build error

* fix the decoding issue

* add model name in exported onnx

* fixing the unit tests

* revert the unneccesary file format changes
2024-03-06 10:07:05 -08:00
Wenbing Li 69a08ffb1d
Remove numpy dependency from its Python binary build (#657) 2024-02-21 09:54:17 -08:00
Sayan Shaw a03eded71e
Add initial CUDA native UT (#625)
* Add initial CUDA native UT

* fix the build issue

* fix other build error

* add 30 mins to android packaging pipeline timeout due to early timing out

* undo android pipeline timeout change - move to other PR

* revert ifdef for testing ci

* add if def for cuda

* update ci ORT linux package name

* update the package extraction path

* Update ci.yml

* Update ci.yml

---------

Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
Co-authored-by: Wenbing Li <wenbingl@outlook.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2024-01-13 15:34:16 -08:00
Wenbing Li a32b932547
add a gen_processing_model option to cast token-id for int64 (#632)
* add a gen_processing_model option to cast token-id for int64

* Update util.py

test pipeline trigger
2024-01-12 10:15:18 -08:00
Rachel Guo fcee38ff68
Add macos platform suppport to onnxruntime-extensions-c pod (#622)
* Squashed commit of the following:

commit 0bd8a9bd49b2bddae3aa0e6c61406e3fb20e011d
Author: rachguo <rachguo@rachguos-Mac-mini.local>
Date:   Thu Dec 14 16:55:29 2023 -0800

    remove #Preview

commit ac2ecdc696d06d579594834a0ffcc01613bd3422
Author: rachguo <rachguo@rachguos-Mac-mini.local>
Date:   Thu Dec 14 15:29:36 2023 -0800

    fix podfile

commit 24bb619fb311f64e28fe3bc94c44912d261ec0bc
Author: rachguo <rachguo@rachguos-Mac-mini.local>
Date:   Thu Dec 14 15:27:57 2023 -0800

    use pre-release version pod now

commit 9e227da06fe29ba01aef1d39a40712fd5dfd9dfc
Author: rachguo <rachguo@rachguos-Mac-mini.local>
Date:   Thu Dec 14 14:09:41 2023 -0800

    update sed

commit 6b9651d4d540845af441bc6cf1d45e1561ec967e
Author: rachguo <rachguo@rachguos-Mac-mini.local>
Date:   Thu Dec 14 13:14:46 2023 -0800

    minor fix

commit 26472d072e2147cd5d92fd6e02dfa182722e109f
Author: rachguo <rachguo@rachguos-Mac-mini.local>
Date:   Thu Dec 14 12:08:42 2023 -0800

    fix pod arch path

commit ba0237e3dd83bed706060969f4bd206ede68fecf
Author: rachguo <rachguo@rachguos-Mac-mini.local>
Date:   Thu Dec 14 11:13:51 2023 -0800

    update yml files

commit 1d91e17743594c28d3030089afae2578daaff848
Author: rachguo <rachguo@rachguos-Mac-mini.local>
Date:   Thu Dec 14 10:25:24 2023 -0800

    add script to substitute podspec file source

commit 248effa32e08cf08c6268ba8ac81ce9bec2b940d
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Thu Dec 14 07:33:21 2023 -0800

    fix pod and update artifacts path

commit 7dfed33706f9e8772126eb78f551cc2110011e64
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Thu Dec 14 01:07:43 2023 -0800

    update

commit 834b03fa69faebc2c7cd948287f870a3f83304a6
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Thu Dec 14 00:07:04 2023 -0800

    update directory name

commit ac46342bb65d4b670b90c4685d6b0d47273edeb5
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Wed Dec 13 23:17:28 2023 -0800

    format

commit 1a10611b28e16cf05e9c91b19eff600e401dde84
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Wed Dec 13 23:16:24 2023 -0800

    copyrights comments and fix .yml format

commit 431682ef154ab93e68a6d099e0604d3a0d7fd804
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Wed Dec 13 23:05:39 2023 -0800

    add macos testing target in the app and testing ci updates

commit dcd0f302b3f0101584a16b91ef5a81559b22cb5a
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Wed Dec 13 14:17:28 2023 -0800

    update opencv.cmake again

commit 28b083c5d39fa743101b30e513f28bef7a82f24b
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Wed Dec 13 11:59:59 2023 -0800

    minor fix

commit d80acdad8583217ec06013f270732df2d8db62b5
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Wed Dec 13 11:26:49 2023 -0800

    add zlib to build from source option and minor update

commit dfd37effec13806ce30ddc5ed76dacefdfbc13f2
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Tue Dec 12 19:48:40 2023 -0800

    update podspec.template file

commit b227c2c196216aef6a05ba58254dedd1bcdcac60
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Tue Dec 12 19:26:40 2023 -0800

    comment out lint pod for now

commit d4bd488006e9d0b25ee01cb7c8447ec2bda620bd
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Tue Dec 12 18:46:38 2023 -0800

    fix podspec.template

commit a477470a3e63b5dd1966d8a943696df887859dd8
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Tue Dec 12 15:45:04 2023 -0800

    minor update

commit a07299decdfc10e7c2e96bd77ee49f97afbe5bd4
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Tue Dec 12 11:28:51 2023 -0800

    clean

commit a83642fbe309bd3c23f93c7aa00801f37cd8a0a3
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Tue Dec 12 11:26:50 2023 -0800

    fix merging framework_info.json process

commit 02980feff9a28a3099906c66a11df1f1d1ecf071
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Tue Dec 12 10:22:42 2023 -0800

    add step for checking the framework_info.json file contents

commit ee224e9e5948a6484dff8378697a71cae07e0801
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Tue Dec 12 09:35:32 2023 -0800

    update to xcframework_info.json

commit 96e13627c2f3c9802d90dd5afe4d96b69d76e012
Author: rachguo <rachguo@rachguos-Mini.attlocal.net>
Date:   Tue Dec 12 01:06:36 2023 -0800

    add changes for macosx build for extensions pod

* address pr comments

* add back supported archs

* update build.py

* reorganize source code avoid duplicates

* add minor note

* exclude macos for ci.yml

* update ci.yml

* address pr comments

* update

* update

* Update tools/ios/assemble_pod_package.py

Co-authored-by: Scott McKay <skottmckay@gmail.com>

---------

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2023-12-19 18:26:12 -08:00