Граф коммитов

26 Коммитов

Автор SHA1 Сообщение Дата
Wenbing Li f204a4c791
Add a decoder for Unigram tokenizer and unify some classes among tokenizers (#816)
* rename and formalize the file names

* add the decoder impl

* fix a typo
2024-09-25 10:25:06 -07:00
Wenbing Li 8f2c35fad0
Add more tests for pre-processing C APIs (#793)
* initial api for tokenizer

* More fixings and test data refinement

* add a simple wrapper for pre-processing APIs

* fix the test issues

* test if the tokenizer is spm based

* fix the failed test cases

* json pointer does not work
2024-08-21 16:48:39 -07:00
Wenbing Li be29e28dd7
support tokenizers build only in C API mode (#783)
* support tokenizer build only in C API mode

* fix the python build.

* fix the selectedops build

---------

Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
2024-08-02 13:28:58 -07:00
Wenbing Li 8153bc1a3a
Feature extraction C API for whipser model (#755)
* Feature extraction C API for whipser model

* Update the docs

* Update the docs2

* refine the code

* fix some issues

* fix the Linux build

* fix more data consistency issue

* More code refinements
2024-07-11 11:20:36 -07:00
Tang, Cheng f0ef40d074
add move constructor and Release API for tensor (#717)
Co-authored-by: Cheng Tang <chenta@microsoft.com@onnxruntime-a10.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>
2024-05-17 11:50:20 -07:00
cao lei dfdf52e759
refactor cuda ops, remove contrib folder (#707)
Co-authored-by: Lei Cao <leca@microsoft.com@onnxruntime-a10.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>
2024-05-03 12:18:59 -07:00
Tang, Cheng 3b889fc42f
update custom op v2 struct to be able to invoke from eager mode (#700)
Co-authored-by: Cheng Tang <chenta@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2024-04-30 13:53:39 -07:00
Wenbing Li a8bce4328b
Add the tokenizer C ABI (#693)
* initial checkins

* fix the selectedops build failures

* add the tokenization implementation

* update the windows DEF file for c abi in cmake file

* fix the build on linux

* fix some warnings and remove the unused code

* initial import of unit tests from tfmtok

* add streaming API support

* fix the merges loading issues

* complete export from tfmtok - needs input id fixing

* fix the unit test failures.

* fix all unit test failure

* refactor streaming code

* remove the unused code

---------

Co-authored-by: Sayan Shaw <sayanshaw@microsoft.com>
2024-04-29 16:45:49 -07:00
Tang, Cheng 1f31d33ed4
Eager mode: cuda kernel support (#694)
* add UT for neg_pos_cuda in eager mode and fix build break in Windows

* fix Linux build break

* adjust argument and path

* remove old cudaContext

* add ort cuda test back

* fix cuda tests

* undo debug code

* undo useless change

---------

Co-authored-by: jslhcl <jslhcl@gmail.com>
Co-authored-by: Cheng Tang <chenta@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
Co-authored-by: Sayan Shaw <52221015+sayanshaw24@users.noreply.github.com>
2024-04-24 12:49:00 -07:00
Wenbing Li 646462790b
Refactor the header file directory and integrate the eager tensor implementation (#689)
* refactor the header file in include folder

* fix the basic-token eager unit test case

* a more flexible way to handle string tensor shape.

* fix the unit test path issue

* remove the multi-inherits to avoid issue during pointer casting

* add api cmake build support

* undo some temporary changes

* code refinement

* fix variadic arg

* only expose the context for ort version >= 17

* fix a shape bug

* fix the cuda build issue

* change ifdef condition of GetAllocator

* finalize the ort c abi wrapper file name

* fix the iOS build break

* align gtest version with triton

* Update ext_apple_framework.cmake for iOS header files

---------

Co-authored-by: Cheng Tang <chenta@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
2024-04-17 12:58:19 -07:00
Wenbing Li 367f59c6fa
Remove the deprecating std::codecvt_utf8 from code base. (#541)
* Remove the deprecating std::codecvt_utf8 from code base.

* utest fix
2023-08-24 10:26:08 -07:00
Scott McKay d9fa8ea060
Split out some miscellaneous changes from refactoring the azure ops. (#506)
- ifdef out some test code that requires RE2 if RE2 is not enabled
- add ability to plugin custom output validator for C++ unit tests
  - OpenAI responses can have different punctuation. used in the new unit tests that will be in the refactoring PR
2023-08-04 17:53:11 +10:00
Wenbing Li 507358545d
improve lowpass filter with a higher order one. (#463)
* improve lowpass filter with a higer order one.

* Update test_sampling.cc

* remove the unneccerary throw in the code
2023-06-01 14:12:05 -07:00
Wenbing Li 9f3abe20fd
Prepare for 0.4.0 release (#151)
* new CI configuration

* Set up CI with Azure Pipelines

[skip ci]

* install numpy in cibuildwheel

* add pyproject.toml

* upgrade vmImage

* update the build python versions

* remove the pytest

* move the wheel build files

* enable sdist setup.py as well.

* use git command line

* Update wheels.yml for Azure Pipelines

* disable the pypy package for macos;

* fix the external repo code tag

* fix the ctest problem

* fix the unicode 8217.

* fix the locale base test
2021-09-25 00:40:12 -07:00
Mojimi d4b2aff0c8
Improve regex (#146)
* add test

* bring back test case

* add ignore case for regex

Co-authored-by: Ze Tao <zetao@microsoft.com>
2021-09-09 13:24:49 +08:00
Mojimi cce66310b2
Improve recent checkin operators (#144)
* update

* update

* update

* remove tokenizer space

* fix bugs

Co-authored-by: Ze Tao <zetao@microsoft.com>
2021-09-07 13:34:47 +08:00
Wenbing Li 2842d2208e
support the non-exception compiling for the text domain. (#142)
* support the non-exception compiling for the text domain.

* fix an path error.
2021-09-02 11:19:18 -07:00
Mojimi aef5ef1ef1
Add BertTokenizer (#135)
* init

* update

* update

* update

* update

* update

* update

* Modify relative path of generated cmake file.

* update

* udapte

* fix the bug

* update

* fix bugs

Co-authored-by: Ze Tao <zetao@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
2021-08-26 13:50:03 -07:00
Mojimi 00448bc78c
Replace Re2 with std::re2 (#129)
* inital commit

* update

* bring selectedoplist back

* remove unnessary change

* update

* fix unittest

* remove test

* fix windows building

* udpate

* update

* undo the changes on test cases

* add the missing C++ flags

Co-authored-by: Ze Tao <zetao@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
Co-authored-by: Wenbing Li <wenli@microsoft.com>
2021-08-25 16:09:35 -07:00
Wenbing Li c891e5d732
Re-organize the source code folder structure (#88)
* Reorg the code folder structure

* update the math test case

* Add an matrix inverse op.

* turn off the ctest by default.

* disbable jpeg lib in dlib for Linux build issue.

* Linux build fixing

* typo

* enable dlib library on Win32 build

* rename ocos to operators

* add the missing operator folder
2021-05-04 17:12:28 -07:00
Xavier Dupré 4bc5c962b0
Add WordpieceTokenizer (#72)
* add Wordpiece tokenizer
* add RaggedTensorToDense
* update documentation
2021-03-11 19:19:49 +01:00
Mojimi 37598feff3
add unicode support by u32string (#69)
* add unicode support by ustring

Co-authored-by: Ze Tao <zetao@microsoft.com>
2021-02-21 14:36:01 +08:00
Xavier Dupré 4d95b53804
add RegexSplit (#66)
Signed-off-by: xavier dupré <xavier.dupre@gmail.com>

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
2021-02-17 14:11:29 -05:00
Xavier Dupré 92f6b51106
Add StringLower operator (#64)
* add StringLower operator
* fix compilation settings
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2021-02-16 12:22:53 +01:00
Mojimi da41b75467
Add op: VectorToString (#57)
* add vector_to_string

* fix merge conflict

* fix building failure

* remove debug code

* fix test

* move back unicode

* fix typo

* move base64 back

* move the right place

* support only int64_t

Co-authored-by: Ze Tao <zetao@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2021-02-09 09:41:00 -08:00
Wenbing Li 33027d2578
Enable C++ end to end test with onnxruntime (#56)
* basic changes

build issues fixing

runing on Windows Platform

deploy the ort library in CMake

update gitignore

* Add C++ shared tests

* enable ctest

* fixing the python build issue

* remove cc test

* why does macos needs openmp package?
2021-02-03 11:17:35 -08:00