Граф коммитов

64 Коммитов

Автор SHA1 Сообщение Дата
Wenbing Li 5104bb9897
fix the win32 macro usage (#844) 2024-11-15 11:26:37 -08:00
Wenbing Li 3da0d3c929
Load the tokenizer data from the memory (#836) 2024-11-09 10:15:21 -08:00
Sayan Shaw 5b7e3d4b8b
Fix prefast issue in image transforms (#837)
* fix prefast issue in image transforms

* Update image_transforms.hpp

* Update image_transforms.hpp

---------

Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2024-10-31 15:10:47 -07:00
Wenbing Li be5aa773e3
Unify the image operations in extensions library (#831)
* Unify the image operations in extensions library

* fix the build configuration issue

* More build fixings

* Fix the native image codec

* fix encode_image

* Add bgr/rgb conversion for encoding image

* parity check

* build break

* update PNG encoding parameters

* build break on Linux

* using MSE to compare images

* fix the discrependency between Linux and Windows

* final code refinement

* one more change

* fix the C++ warnings

---------

Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
2024-10-30 09:17:06 -07:00
Wenbing Li aa2c82fa67
Add the MLlama Imaging Processing Support (#823)
* initial checkins for mllama image process

* fix some tests

* some fixings

* add more image

* More test assertions

* parity test passed

* code clean up

* code refinement
2024-10-22 14:24:09 -07:00
Wenbing Li 1fb87a30f7
Validate the tokenizer class name on data loading (#830) 2024-10-21 13:25:37 -07:00
Wenbing Li 62c0a7bfda
fix the unigram detector for last HG tokenizer (#820) 2024-10-03 14:25:53 -07:00
Chester Liu e424838708
Added support for native image decoding (#808)
This added support for native image decoding on Windows & Apple platforms.
This helps us remove libpng & libjpeg completely on these platforms, and
in the meantime support more image formats thanks to OS vendors,
2024-09-26 09:17:55 +08:00
Wenbing Li f204a4c791
Add a decoder for Unigram tokenizer and unify some classes among tokenizers (#816)
* rename and formalize the file names

* add the decoder impl

* fix a typo
2024-09-25 10:25:06 -07:00
Wenbing Li 6b94f4d7a5
Fix the Unicode code discrepency on CLIP model (#814)
* refine the code structure

* more fixing on unicode

* fix the codepoint 304

* add the clip tokenizer data files abck
2024-09-23 16:49:24 -07:00
Wenbing Li 176c1d0138
Support the Unigram tokenizer kind from sentencepiece library (#811)
* initial commit

* Ugm vocab loaded is good

* test passed

* fixes unit test on win32

* finish the parity check

* code refinement

* code refinement for review
2024-09-19 15:46:13 -07:00
Wenbing Li 1b80794903
Remove OpenCV dependency from C_API mode (#800)
* Remove OpenCV dependency from C_API model

* fix build on Windows

* switch ci build flag

* try to fix the macOS build issue

* more fixing

* fix the macOS build issue

* list jpeg source

* verified on MacOS

* update the pp_api too

* avoid the codecs library conflicts

* Add the unit tests

* move the codec test

* add the missing dl lib for extensions test

* refine the code

* a smaller fixing for Windows Python
2024-09-04 16:50:05 -07:00
Wenbing Li 2d02a687be
Optimize the tokenizer for efficiency (#797)
* optimize the tokenizer for efficiency

* fix the unit test failures.

* fix the api test case failures

* removed the unused code.

* More test cases fixings

* One more fixing

* fix macOS build issues

* refine the test

* add more diagnosis info.

* fix unit test in CI Linux

* fix the pp_api test failure
2024-08-27 18:57:50 -07:00
Wenbing Li 711a2cfa69
add a convert_token_string_to_an_id API for the prompt ids (#794)
* add a convert token string to an id API for the prompt ids

* fix the build issues on Linux
2024-08-19 16:44:07 -07:00
Sayan Shaw 7851b51ee3
Add initial tiktoken and Phi3SmallTokenizer support (#729)
* add initial tiktoken support

* add vector hash and equal for bpe ranks map

* change lambda comparator

* move phi-3-small files

* final changes

* move tiktoken files from data2 to data

* add unit test

* add tokenizer module

* merge json and tiktoken impl

* fix tiktoken encoding problem

* address comments

* remove dummy tokens

---------

Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2024-08-02 10:24:02 -07:00
Wenbing Li c3145b8f52
add the decoder_prompt_id for whisper tokenizer (#775)
* add the decoder_prompt_id for whisper tokenizer

* temporarily disable android prebuilt

* disable the prebuilt for android

* disable the prebuilt for android 2

* Add a unit test

* correct test ids
2024-07-29 14:21:17 -07:00
Wenbing Li 620050fbe0
reimplement resize cpu kernel for image processing (#768)
* reimplement resize cpu kernel for image processing

* accuracy fixing and code refinement

* fix the build issues

* fix Linux build issue

* more fixings

* Fix the pipeline issue

* fix the ci script

* try to fix CUDA machine pool
2024-07-23 15:40:52 -07:00
Wenbing Li 38a3d85f8f
switch cmake cmp0169 flag to new (#762)
* switch cmake cmp0169 flag to new

* the missing spm code.

* more refinement on cmake build targets

* Update ci.yml

* Update ci.yml

* update the jpg files after using libjpeg instead of libjpeg-turbo

* exclude cutlass too

* upgrade the protobuf library to be consistent with ORT

* update the protoc generated files

* use the right patch name

* Update cutlass.cmake
2024-07-15 23:28:49 -07:00
Wenbing Li 8153bc1a3a
Feature extraction C API for whipser model (#755)
* Feature extraction C API for whipser model

* Update the docs

* Update the docs2

* refine the code

* fix some issues

* fix the Linux build

* fix more data consistency issue

* More code refinements
2024-07-11 11:20:36 -07:00
Wenbing Li cbed8fd575
Add a generic image processor and its C API (#745)
* Add a generic image processor

* add more tests

* Fix the test failures

* Update runner.hpp
2024-06-20 10:53:49 -07:00
Chester Liu 58b552388f
Fix several C5038 warnings (#748) 2024-06-20 08:20:30 +08:00
Wenbing Li ca433cbea7
Refactor the unit tests and cmake build script (#726)
* refine the build script

* complete the unit tests.

* remove the commented code
2024-05-30 14:16:14 -07:00
Wenbing Li 474540d8a5
Fix the image processing output data discrepancy (#722)
* some data calc fixing

* Update image_transforms.hpp

* really split the images

* Update image_transforms.hpp
2024-05-20 12:44:48 -07:00
Wenbing Li c3c5f1cbb1
Remove C++ filesystem library dependency for the compatibility of old system (#721)
* Remove C++ filesystem library dependency for the compatibility of old OS.

* Update file_sys.h
2024-05-18 07:23:45 -07:00
Wenbing Li 97ee9eb56f
Refactor OrtxStatus to be header-only implmentation. (#720) 2024-05-17 15:40:11 -07:00
Wenbing Li 4781a9d1d8
Add ci pipeline for pre-processing API testing (#718)
* Add ci pipeline for pre-processing API testing

* update cmake for testing

* add test cases back

* add other two pipelines

* fix macos pipeline
2024-05-16 15:39:52 -07:00
Wenbing Li 311dd35401
Add ImageProcessor for Multimodel model Pre-processing (#715)
* only keep the image decoder from opencv

* initial build

* refine the code

* Add clear functions

* Update CMakeLists.txt

* Update opencv.cmake

* change the output type to float

* get the result

* align image-process with original Python

* move the LoadRawImages into library

* fix the calculation error

* fix the pipeline build issue

* fix the build breaks in ci pipeline

* support json configuration file and refactor the code.
2024-05-15 14:35:14 -07:00
Wenbing Li a8bce4328b
Add the tokenizer C ABI (#693)
* initial checkins

* fix the selectedops build failures

* add the tokenization implementation

* update the windows DEF file for c abi in cmake file

* fix the build on linux

* fix some warnings and remove the unused code

* initial import of unit tests from tfmtok

* add streaming API support

* fix the merges loading issues

* complete export from tfmtok - needs input id fixing

* fix the unit test failures.

* fix all unit test failure

* refactor streaming code

* remove the unused code

---------

Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
2024-04-29 16:45:49 -07:00
Wenbing Li f9290e8bac
Add a status class for future tokenizer API implementation (#690)
* Add a status class for future API implementation

* Update bpe_kernels.cc

* fix the ios package pipeline

* update mistral test model name
2024-04-18 21:12:14 -07:00
RandySheriffH 9b2daf56de
Rashuai/bert cuda (#614)
* cmake fast gelu

* bridge func and cuda kernel

* tune ut

* fix build warning

* fix format

* tune ut

* drop OCOS_ENABLE_CONTRIB

* tune cmake

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-12-05 12:59:19 -08:00
Wenbing Li a0c2625511
Add CUDA build support and some code refinements (#581)
* the cuda kernel first example

* Update test_ortops.cc

* revert some unneccesary changes

* unix-like os build failure

* refactor header files

* fix python dll exporting error.
2023-10-30 21:06:30 -07:00
Wenbing Li 914509d524
Enable the status returnable APIs from ORT 1.16 C ABI (#558)
* Initial checkins for returnable ORT ABIs

* fix for linux build

* more fixes on Python, test...

* remove the statusmsg

* native unit tests fixing

* Python unit tests fixing

* last unit test fixing
2023-09-13 14:59:09 -07:00
Scott McKay 2bde82fce9
Refactor setup for Azure ops. Add Android support. (#507)
* Refactor setup for Azure ops to try and make common things more re-usable, and for the actual ops to simply layer in the specific input/output constraints for that type of request.

Currently builds on Linux, Windows (x64 only) and Android

Android requires a manual pre-build of openssl and curl.

Linux requires a manual pre-install of openssl.

Windows currently only works for x64. Other targets need the triplet adjusted.

* Address PR comments

* Fix could of android build warnings.

* Update .gitignore to remove old path

* Fix build break from merge
2023-08-08 19:54:30 +10:00
Wenbing Li bab1989644
refine audiodecoder with new api (#489)
* refine audiodecoder with new api

* update std::optional usage for macOS
2023-07-12 13:11:58 -07:00
Wenbing Li e3d9198de8
using the latest Ort header instead of minimum compatible headers (#485)
* using the latest Ort header instead of minimum compatible headers

* Update ext_ortlib.cmake

* Update ortcustomops.def

* change the default ORT API version value
2023-07-10 16:10:11 -07:00
RandySheriffH 27132ced71
Implement azure invokers (#487)
* Implement azure invokers (#486)

* draft azure ops

* migrate triton client

* AzureAudioInvoker works

* triton client builds

* triton invoker works

* limit version

* restore setup.py

* limit ort version

* upgrade version

* pip install cmake

* add ut

* promote ort header version to 1.15.1

* register as cpu op

* limit triton invoker to 1.14 and newer

* remove test

* install rapidjson

* install dep

* sudo install

* install version script

* print err msg

* fix pipeline

* disable from web assembly

* install cmake

* Fix pipelines (#479)

* 1

* 2

* 3

* 4

* 5

* 6

* 7

* 8

* 9

* 10

* 11

* 12

* 13

* 14

* 15

* 16

* 17

* 18

* 19

* 20

* 21

* 22

* 23

* 24

* 25

* 26

* 27

* 28

* 29

* 30

* 31

* 32

* 33

* 34

* 35

* 36

* 37

* 38

* 39

* 40

* 41

* 42

* 43

* 44

* 45

* 46

* 47

* 47

* 48

* 49

* 50

* 51

* 52

* 53

* 54

* 55

* 56

* 57

* 58

* 59

* 60

* 61

* 62

* 62

* 63

* 64

* 65

* 66

* 67

* 68

* 69

* 70

* 71

* 72

* 73

* 74

* 75

* 76

* 77

* 78

* 79:

* 80:

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>

* fix pipelines (#481)

* 1

* 2

* 3

* 4

* 5

* 6

* 7

* 8

* 9

* 10

* 11

* 12

* 13

* 14

* 15

* 16

* 17

* 18

* 19

* 20

* 21

* 22

* 23

* 24

* 25

* 26

* 27

* 28

* 29

* 30

* 31

* 32

* 33

* 34

* 35

* 36

* 37

* 38

* 39

* 40

* 41

* 42

* 43

* 44

* 45

* 46

* 47

* 47

* 48

* 49

* 50

* 51

* 52

* 53

* 54

* 55

* 56

* 57

* 58

* 59

* 60

* 61

* 62

* 62

* 63

* 64

* 65

* 66

* 67

* 68

* 69

* 70

* 71

* 72

* 73

* 74

* 75

* 76

* 77

* 78

* 79:

* 80:

* 81

* 82

* 83

* 84

* 85

* 86

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>

* test as cpu op

* add ut

* add ut

* move cond

* tune ut

* tune pipeline

* promote to ort 141

* reset header version

* restore cmake

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>

* trim changes

* revert req txt

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-07-10 10:07:33 -07:00
Wenbing Li b5dce955f0
Add an audio decoder custom op for whisper end-to-end processing (#385)
* evaluate the audio decoder library

* MP3 Decoder

* rename it to test_audio_codec

* add the audio decoder to whisper model

* whisper end-to-end draft

* fix the mp3 decoder

* Running with ONNX models

* Add more audio format supports

* refine the end-to-end script

* Update operators/audio/audio_decoder.hpp

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

* Update operators/audio/audio_decoder.hpp

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

* Update operators/audio/audio_decoder.hpp

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

* some fixings of comments and more test cases.

* changes for review comments.

* Update audio_decoder.hpp

* Update audio_decoder.hpp

* code refinement

* Update operators/audio/audio_decoder.hpp

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

---------

Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-04-11 14:47:10 -07:00
Scott McKay 5e44a7c3c9
Add ability to prevent exception propagation if building as part of ORT when ORT has exceptions disabled (#368)
* Add ability to prevent exception propagation with top level try/catch hander macros.

If combined build with ORT has exceptions disabled in ORT but ort-ext has an operator that requires exceptions, we enable exceptions in ort-ext but prevent them propagating up via try/catch in the entry points that ORT can call
  - RegisterCustomOps
  - CustomOpBase constructor and Compute

Removed some places in CustomOpApi that threw is OpKernelInfo* was nullptr but standardizing all kernels to store the OpKernelInfo provided in the ctor.

Added unit tests
  - need to validate on more platforms and add CI for build where we don't want to allow exceptions to propagate

* Update pyop

* Update CMakeLists.txt

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

* Update includes/exceptions.h

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

* Update includes/exceptions.h

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

* Update includes/onnxruntime_customop.hpp

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

* Merge with main and update
Address PR comments
Fix some issues.

* Delete local file

* Fix pyop update

* Add CI
Address PR comments

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2023-02-27 10:31:44 -08:00
Edward Chen ec83a138a6
Add BertTokenizer to iOS package ops config. (#347)
* Add BertTokenizer to iOS package ops config.

* Also register tokenizer ops in com.microsoft.extensions.

Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2023-01-18 19:21:03 -08:00
Wenbing Li c599b00d07
Using the header files from the ONNXRuntime package (#322)
* Using the header files from the ONNXRuntime package

* Update includes/onnxruntime_customop.hpp

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

* fix the build break.

* one more fixing

* wired top project

* ort 1.9.0 used

* switch to 1.10.0 package.

* change the vmimage to latest

* URL issue

* cmake policy

* ignore onnxruntime.dll native scan

* update the Onebranch exclusedPaths

* fixing some build tool issues

* update again

* typo

* undo of ORT dll removal

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2022-12-09 14:30:24 -08:00
Scott McKay 1cab9711ff
Starter changes for supporting pre/post processing for vision models. (#312)
* Initial changes for supporting mobilenet and superresolution.
- Script to update model with pre/post processing
- custom ops for decode/encode
  - user just has to provide jpg or png bytes
  - superresolution can return the updated image in jpg or png
- models for testing

Updated cmake setup to enable building of the vision pre/post processing ops
  - opencv2 is treated as an internal dependency rather than the mechansim for selecting which operators to include.

* Add extra check in decode.
2022-11-24 07:40:56 +10:00
Wenbing Li 7fc0224410
Partner team's code security fixings (#300) 2022-10-05 16:10:34 -07:00
Wenbing Li 08659eae90
Initial Java API for the JAR package. (#292)
* more C++ code fixing and polish for release

* fixing for android build

* build flags for android release

* add missing exporting function

* imint

* first versoin

* more C++ code fixing and polish for release (#275)

* more C++ code fixing and polish for release

* fixing for android build

* build flags for android release

* add missing exporting function

* support build_id on Python package building (#281)

* support buildid in package building

* undo the change on build.sh

* build.sh issue on macos

* Add `$schema` to `cgmanifest.json` (#284)

Co-authored-by: Jamie Magee <jamie.magee@microsoft.com>

* test package with a simple java app

* demo app

* some fixing for windows platform

* refine the example app

* fix the missing symobls issue for Linux build

* fix the package package build issue

* typo

* a missing change

* fix PythonOp

* fix Android test issue

* one more Android change

* replace build flags in ci pipeline

* android AAR package build

* refine the code for android package

Co-authored-by: Jamie Magee <jamie.magee@gmail.com>
Co-authored-by: Jamie Magee <jamie.magee@microsoft.com>
2022-10-04 16:22:28 -07:00
Wenbing Li 134f882e64
more C++ code fixing and polish for release (#275)
* more C++ code fixing and polish for release

* fixing for android build

* build flags for android release

* add missing exporting function
2022-08-04 10:13:17 -07:00
Wenbing Li 1a04abdf3e
Add two opencv operators as ONNX custom ops. (#249)
* Add two opencv operators as ONNX custom ops.

* update the git apply command line

* adjust the difference threshold

* do not break the build on binskim issue

* Make ImageReader be optional

* try to fix some potential build break

* undo the debug flag in setup.cfg
2022-06-15 23:22:10 -07:00
Lucas Gomez Jimenez 21dc637a6d
Ensure custom op domain is released (#234) 2022-05-25 16:51:49 -07:00
Wenbing Li 2842d2208e
support the non-exception compiling for the text domain. (#142)
* support the non-exception compiling for the text domain.

* fix an path error.
2021-09-02 11:19:18 -07:00
Mojimi 97ec950751
Add SegementExtraction and BertTokenizerDecoder (#140)
* status

* update

* update

* fix bug

Co-authored-by: Ze Tao <zetao@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2021-08-27 13:54:02 -07:00
Wenbing Li cb47ee4d44
Standardize the public header files. (#139)
* a couple of fixing

* add a library alias
2021-08-27 12:51:02 -07:00
Mojimi aef5ef1ef1
Add BertTokenizer (#135)
* init

* update

* update

* update

* update

* update

* update

* Modify relative path of generated cmake file.

* update

* udapte

* fix the bug

* update

* fix bugs

Co-authored-by: Ze Tao <zetao@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
2021-08-26 13:50:03 -07:00