Граф коммитов

575 Коммитов

Автор SHA1 Сообщение Дата
Wenbing Li e19c0894ec
Fix CUDA CI build failures (#824) 2024-10-11 16:08:44 -07:00
Wenbing Li 62c0a7bfda
fix the unigram detector for last HG tokenizer (#820) 2024-10-03 14:25:53 -07:00
Stalin Sabu Thomas f47bed4596
add(tutorials): exporting yolo world model (#803)
* add(tutorials): exporting yolo world model

This allows us to export yolo world onnx model which can be later used in mobile inference.

* add(tutorial): make classes optional

---------

Co-authored-by: Scott McKay <skottmckay@gmail.com>
2024-10-03 14:42:35 +10:00
Wenbing Li 12a9e8beb4
support sentence-piece add_dummy_prefix for all models (#819)
* add compatibility docs

continue updating the doc

updating doc 2

* support sentence-piece add_dummy_prefix for all models

* revert the flag

* initialize the add_dummy_prefx for llama model
2024-10-01 09:08:59 -07:00
Wenbing Li e710d80f71
Improve Documentation: Add Hugging Face Compatibility Docs and Refine the existing docs (#818)
* add compatibility docs

* continue updating the doc

* updating doc 2

* revert the bpe changes
2024-09-30 13:04:33 -07:00
Wenbing Li 2c3e936cfc
support the merges array in tokenizer.json (#817) 2024-09-26 11:01:13 -07:00
Chester Liu e424838708
Added support for native image decoding (#808)
This added support for native image decoding on Windows & Apple platforms.
This helps us remove libpng & libjpeg completely on these platforms, and
in the meantime support more image formats thanks to OS vendors,
2024-09-26 09:17:55 +08:00
Chester Liu f90a04606b
Fix unused result warnings (#802)
Fix several unused result warnings

---------

Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>
2024-09-26 07:54:16 +08:00
Wenbing Li f204a4c791
Add a decoder for Unigram tokenizer and unify some classes among tokenizers (#816)
* rename and formalize the file names

* add the decoder impl

* fix a typo
2024-09-25 10:25:06 -07:00
Wenbing Li 6b94f4d7a5
Fix the Unicode code discrepency on CLIP model (#814)
* refine the code structure

* more fixing on unicode

* fix the codepoint 304

* add the clip tokenizer data files abck
2024-09-23 16:49:24 -07:00
Wenbing Li 176c1d0138
Support the Unigram tokenizer kind from sentencepiece library (#811)
* initial commit

* Ugm vocab loaded is good

* test passed

* fixes unit test on win32

* finish the parity check

* code refinement

* code refinement for review
2024-09-19 15:46:13 -07:00
Sayan Shaw 0d5d19f67b
fix prefast warning (#809)
Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
2024-09-15 22:34:07 -07:00
Chester Liu 8d842d85e3
Rm zlib when linking ocos_operators (#807) 2024-09-13 07:07:10 +08:00
Sayan Shaw 8bc8e43da1
Add C++ regex support for Llama3, Standard Library, and Custom Cases (#804)
* add C++ standard library regex support for GPT2 case

* reorder regex handling

* try without STL

* missing case

* add llama3 regex support

* add custom regex impl

* change regex based on model

* modify tests, add docs, and code cleanup

* add regex test and const strings

---------

Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
2024-09-10 23:17:49 -07:00
Scott McKay 9164f54e5d
Don't disable vision operators in a catalyst build. (#805)
* Don't disable vision operators in a catalyst build.

* Patch to exclude NSImage on Mac-catalyst as it's not supported.
2024-09-10 08:58:09 +10:00
Wenbing Li 90d8f33172 Revert "some data calc fixing"
This reverts commit dae9510dbb.
2024-09-05 09:30:19 -07:00
Wenbing Li dae9510dbb some data calc fixing
really split the images

test with sus
2024-09-05 09:26:05 -07:00
Wenbing Li 1b80794903
Remove OpenCV dependency from C_API mode (#800)
* Remove OpenCV dependency from C_API model

* fix build on Windows

* switch ci build flag

* try to fix the macOS build issue

* more fixing

* fix the macOS build issue

* list jpeg source

* verified on MacOS

* update the pp_api too

* avoid the codecs library conflicts

* Add the unit tests

* move the codec test

* add the missing dl lib for extensions test

* refine the code

* a smaller fixing for Windows Python
2024-09-04 16:50:05 -07:00
Kyle 7c3ce36af8
Add Files Signature Validation after Signed by ESRP (#801)
* vlidate sign after ERSP

* blank line

* format
2024-09-02 17:17:03 +08:00
Wenbing Li b8b2ebfb85
optimize spm tokenizer for long text (#799)
* optimize spm tokenizer for long text

* refine the split logic

* re-trigger CI pipeline.
2024-08-30 14:58:40 -07:00
Prathik Rao 6f532376c9
bump (#791)
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2024-08-27 18:58:18 -07:00
Wenbing Li 2d02a687be
Optimize the tokenizer for efficiency (#797)
* optimize the tokenizer for efficiency

* fix the unit test failures.

* fix the api test case failures

* removed the unused code.

* More test cases fixings

* One more fixing

* fix macOS build issues

* refine the test

* add more diagnosis info.

* fix unit test in CI Linux

* fix the pp_api test failure
2024-08-27 18:57:50 -07:00
Yi Zhang 2d044adbf9
sign with the correct key code (#796)
Fixes incorrect dll singnature
2024-08-26 16:48:29 +08:00
Wenbing Li 8f2c35fad0
Add more tests for pre-processing C APIs (#793)
* initial api for tokenizer

* More fixings and test data refinement

* add a simple wrapper for pre-processing APIs

* fix the test issues

* test if the tokenizer is spm based

* fix the failed test cases

* json pointer does not work
2024-08-21 16:48:39 -07:00
Zhipeng Han 85ffb94169
Update custom_ops.md (#795)
add domain for SentencePiece Op
2024-08-21 09:52:54 -07:00
Wenbing Li 711a2cfa69
add a convert_token_string_to_an_id API for the prompt ids (#794)
* add a convert token string to an id API for the prompt ids

* fix the build issues on Linux
2024-08-19 16:44:07 -07:00
vraspar 6ce22f8ac4
Update nuget extraction path for iOS xcframework (#792)
* Update nuget extraction path for iOS xcframework

* Update nuget extraction path for iOS xcframework
2024-08-16 10:34:40 +10:00
vraspar 8b5354fb67
Update macosx framework packaging to follow apple guidelines (#776)
* Update macosx framework packaging to follow apple guidelines

* Test path fix

* Update tools/ci_build/extract_nuget_files.ps1

---------
2024-08-13 10:37:22 +10:00
Wenbing Li be29e28dd7
support tokenizers build only in C API mode (#783)
* support tokenizer build only in C API mode

* fix the python build.

* fix the selectedops build

---------

Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
2024-08-02 13:28:58 -07:00
Sayan Shaw 7851b51ee3
Add initial tiktoken and Phi3SmallTokenizer support (#729)
* add initial tiktoken support

* add vector hash and equal for bpe ranks map

* change lambda comparator

* move phi-3-small files

* final changes

* move tiktoken files from data2 to data

* add unit test

* add tokenizer module

* merge json and tiktoken impl

* fix tiktoken encoding problem

* address comments

* remove dummy tokens

---------

Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2024-08-02 10:24:02 -07:00
Wenbing Li 46998e96fb
Update build-package-for-windows.yml (#784) 2024-08-01 14:45:26 -07:00
Wenbing Li 4bb63dd2aa
Upgrade ESRP signing task from v2 to v5 (#780)
* Upgrade ESRP signing task from v2 to v5

* Upgrade ESRP signing task from v2 to v5 in win

---------

Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
2024-08-01 09:57:59 -07:00
Wenbing Li 8b002b86ab
Fix the case that bos_token is null (#781) 2024-07-31 17:50:20 -07:00
Wenbing Li b4ebfc9519
Fix spm converted FastTokenizer issue on non-ascii char (#778)
* Fix spm converted tokenizer issue on non-ascii char

* remove pkg_resource in python
2024-07-31 14:22:25 -07:00
Prathik Rao e113ed30f1
removed OpenAIAudioToText from config (#777) 2024-07-31 10:40:43 -07:00
Wenbing Li c9c11b4846
Fix the windows API missing issue and Linux shared library size issue for Java packaging. (#774)
* Fix the java packaging issues

* add the jar path example for Linux build with a default configuration
2024-07-29 16:03:58 -07:00
Wenbing Li c3145b8f52
add the decoder_prompt_id for whisper tokenizer (#775)
* add the decoder_prompt_id for whisper tokenizer

* temporarily disable android prebuilt

* disable the prebuilt for android

* disable the prebuilt for android 2

* Add a unit test

* correct test ids
2024-07-29 14:21:17 -07:00
Wenbing Li 620050fbe0
reimplement resize cpu kernel for image processing (#768)
* reimplement resize cpu kernel for image processing

* accuracy fixing and code refinement

* fix the build issues

* fix Linux build issue

* more fixings

* Fix the pipeline issue

* fix the ci script

* try to fix CUDA machine pool
2024-07-23 15:40:52 -07:00
Prathik Rao d79299e733
increase timeout (#773) 2024-07-22 17:48:27 -07:00
Prathik Rao 735041e59f
increase timeout (#772) 2024-07-22 14:31:44 -07:00
Wenbing Li bfcca2cb76
Fix the win32 Python packaging pipeline (#771) 2024-07-19 17:24:40 -07:00
Changming Sun e95ae84ba6
Switch aiinfra-Linux-CPU machine pool to onnxruntime-Ubuntu2204-AMD-CPU (#765) 2024-07-17 13:53:30 -07:00
Wenbing Li 38a3d85f8f
switch cmake cmp0169 flag to new (#762)
* switch cmake cmp0169 flag to new

* the missing spm code.

* more refinement on cmake build targets

* Update ci.yml

* Update ci.yml

* update the jpg files after using libjpeg instead of libjpeg-turbo

* exclude cutlass too

* upgrade the protobuf library to be consistent with ORT

* update the protoc generated files

* use the right patch name

* Update cutlass.cmake
2024-07-15 23:28:49 -07:00
Wenbing Li 8153bc1a3a
Feature extraction C API for whipser model (#755)
* Feature extraction C API for whipser model

* Update the docs

* Update the docs2

* refine the code

* fix some issues

* fix the Linux build

* fix more data consistency issue

* More code refinements
2024-07-11 11:20:36 -07:00
cao lei 95d65e4ec0
sync to flash attention kernel 2.5.9 and add document of how to write custom op (#757)
* sync to flash attention kernel 2.5.9

* support users to overload GetMayInplace and ReleaseMayInplace

* Undo the change for pybind11 dependency
2024-07-10 07:09:40 -07:00
Wenbing Li b436d09459
Fix the CI pipeline for the latest PyTorch release. (#759) 2024-07-08 16:21:48 -07:00
Wenbing Li f1abea14e8
Update CMakeLists.txt (#754) 2024-06-25 11:12:21 -07:00
Chester Liu 0f1f454867
Fix C4459 warning in custom_op_lite.h (#751)
Internal workitem: https://task.ms/aii/29719

Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>
2024-06-25 10:28:27 +08:00
Wenbing Li 3b275b16bc
Upgrade pybind11 2.12 to support both numpy 1.x and 2.x (#750) 2024-06-20 15:18:17 -07:00
Wenbing Li cbed8fd575
Add a generic image processor and its C API (#745)
* Add a generic image processor

* add more tests

* Fix the test failures

* Update runner.hpp
2024-06-20 10:53:49 -07:00