Граф коммитов

60 Коммитов

Автор SHA1 Сообщение Дата
Xavier Dupré 4bc5c962b0
Add WordpieceTokenizer (#72)
* add Wordpiece tokenizer
* add RaggedTensorToDense
* update documentation
2021-03-11 19:19:49 +01:00
Mojimi 9653f52341
Add batch query and attention mask support for GPT2Tokenizer (#75)
* add batch_mode and padding for GPT2Tokenizer

* fix text

* fix test and add doc

* fix test

* fix comments

* delete header

Co-authored-by: Ze Tao <zetao@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2021-03-10 13:57:07 -08:00
Xavier Dupré c4f66c2822
Add onnxruntime 1.7.1 to CI (#73)
* Add onnxruntime 1.7.0 to CI

Signed-off-by: xavier dupré <xavier.dupre@gmail.com>

* use 1.7.1

Signed-off-by: xavier dupré <xavier.dupre@gmail.com>

* url code

Signed-off-by: xavier dupré <xavier.dupre@gmail.com>

* ci

Signed-off-by: xavier dupré <xavier.dupre@gmail.com>

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
2021-03-10 08:40:34 -08:00
Mojimi 2378ca116b
add StringConcat (#70)
Co-authored-by: Ze Tao <zetao@microsoft.com>
2021-02-25 10:44:30 +08:00
Mojimi 37598feff3
add unicode support by u32string (#69)
* add unicode support by ustring

Co-authored-by: Ze Tao <zetao@microsoft.com>
2021-02-21 14:36:01 +08:00
Mojimi 844b9d44f1
Add StringLength op (#68)
* add string_length op
2021-02-19 14:34:01 +08:00
Xavier Dupré 4d95b53804
add RegexSplit (#66)
Signed-off-by: xavier dupré <xavier.dupre@gmail.com>

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
2021-02-17 14:11:29 -05:00
Xavier Dupré 92f6b51106
Add StringLower operator (#64)
* add StringLower operator
* fix compilation settings
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2021-02-16 12:22:53 +01:00
Xavier Dupré a27b2a2b17
refactor c test (#63)
Signed-off-by: xavier dupré <xavier.dupre@gmail.com>

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2021-02-10 20:24:43 +01:00
Wenbing Li b5ba84a185
Add hook feature for the model debugging. (#62)
* Add hook feature for the model debugging.

* use the dynamic noode name

* default as floating

* typo

* skip on macos

* add documents
2021-02-10 11:02:06 -08:00
Mojimi da41b75467
Add op: VectorToString (#57)
* add vector_to_string

* fix merge conflict

* fix building failure

* remove debug code

* fix test

* move back unicode

* fix typo

* move base64 back

* move the right place

* support only int64_t

Co-authored-by: Ze Tao <zetao@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2021-02-09 09:41:00 -08:00
Xavier Dupré c4c598e1b0
Add more information about fetched external libraries (#60)
Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
2021-02-08 09:33:34 -08:00
Xavier Dupré b3a300d7bf
Add attribute global_replace to StringRegexReplace (#55)
* Add attribute global_replace to StringRegexReplace

Signed-off-by: xavier dupré <xavier.dupre@gmail.com>

* fix potential wrong pointer

Signed-off-by: xavier dupré <xavier.dupre@gmail.com>

* update sep

Signed-off-by: xavier dupré <xavier.dupre@gmail.com>

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>

> It seems to be working now.

I enabled some less secured option in pipeline. let's see how it goes.
2021-02-05 09:08:02 -08:00
Wenbing Li e4f462c2a7
Enable C++ test on Azure Pipeline (#59)
* enable c++ test on Windows

* add linux/macos platform

* no extractfile task on unix-like platform

* fixing on unix-like platform

* try a fixing on macos
2021-02-04 10:37:11 -08:00
Xavier Dupré a07694636d
Support string parameters in python operators (#53)
* support string parameter in python operators

Signed-off-by: xavier dupré <xavier.dupre@gmail.com>

* better error messages

Signed-off-by: xavier dupré <xavier.dupre@gmail.com>

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2021-02-04 10:05:59 -08:00
Wenbing Li 33027d2578
Enable C++ end to end test with onnxruntime (#56)
* basic changes

build issues fixing

runing on Windows Platform

deploy the ort library in CMake

update gitignore

* Add C++ shared tests

* enable ctest

* fixing the python build issue

* remove cc test

* why does macos needs openmp package?
2021-02-03 11:17:35 -08:00
Wenbing Li ddf9b873ad
Update mshost.yaml for Azure Pipelines (#46)
* Update mshost.yaml for Azure Pipelines

* Update mshost.yaml for Azure Pipelines

* Update mshost.yaml

* Update mshost.yaml for Azure Pipelines
2021-01-29 15:23:05 -08:00
Xavier Dupré 4c201e7800
Change Fix type of nbtest_size into int64 (#54)
Signed-off-by: xavier dupré <xavier.dupre@gmail.com>

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
2021-01-29 14:21:50 -05:00
Xavier Dupré a32f9bc28c
Documentation for SentencepieceTokenizer (#52) 2021-01-28 19:07:48 +01:00
Mojimi a9a498501c
Improve GPT2 (#48)
* test attribute

* finish improvement

Co-authored-by: Ze Tao <zetao@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2021-01-27 15:26:37 -08:00
Xavier Dupré a98c29f6d2
Implement custom operators for sentancepiece (#41)
* implements sentancepiecetokozenizer
* add ragged to sparse
* move one input to attribute
2021-01-27 23:55:50 +01:00
Xavier Dupré d1c657486d
remove _GetApi (#49) 2021-01-27 18:51:17 +01:00
Xavier Dupré d48d825a66
Run python unit tests on every platform + fix all string operators (#44)
* Use appropriate API for strings
* Modify all string operators
* Enable ci on linux and MacOs
2021-01-21 19:49:16 +01:00
Mojimi 4a0f892949
Operator Schemas of text processing ops (#42)
* add a schema doc for new Op
Co-authored-by: Ze Tao <zetao@microsoft.com>
2021-01-21 15:07:32 +08:00
Wenbing Li 9ec6951516
Test the Gpt2 Tokenizer both in two modes, PyOP and Native (#40)
* add a flag to enable pyop

* test passed

* a little polish.
2021-01-12 16:11:02 -08:00
Wenbing Li 4e0af5c582
A more formal build process and the fixing of unix-like environment. (#39)
* enable directly pip package build.

* some link symbols

* fixing on Windows platform

* update the build instruction

* update the ci pipeline

* Fix the Linux and MacOS build.

* Update mshost.yaml

* updat the ci python version

* update the pipeline

* simplify the instruction.

* update according to the comments.

Co-authored-by: Wenbing Li <wenli@MacM1.local>
2021-01-11 13:44:17 -08:00
TomWildenhain-Microsoft 55e9c4965e
Renamed tf2onnx tutorial (#37)
* Renamed tf2onnx tutorial

* Added file for C++ tutorial
2021-01-08 10:58:29 -08:00
Wenbing Li 2dfd95e64b
rename the package name to onnxruntime-customops (#36)
* normalize the root package name in Python

* fixing build on Linux

* update the tutorial as well
2020-12-22 20:16:30 -08:00
Wenbing Li c7b2f864c6
Add Huggingface GPT2Tokenizer Support (#35)
* initialize a bbpe tokenizer

* add the json library.

* gpt2 tokenizer cpp implementation.

* Tom/add tutorial (#32)

* Added getting started instructions for Windows

Signed-off-by: Tom Wildenhain <tomwi@microsoft.com>

* Created a tutorial for converting models with custom ops. WIP

* Removed long outputs

* Changed to keras syntax and added setup instructions

Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>

* rename gpt2 test case file

* polish the symbol names in the sources

* polish it again.

* fix the build issue on macos

* another fixing

* another fixing 3

Co-authored-by: TomWildenhain-Microsoft <67606533+TomWildenhain-Microsoft@users.noreply.github.com>
2020-12-21 17:12:32 -08:00
TomWildenhain-Microsoft a7b4ff310d
Tom/add tutorial (#32)
* Added getting started instructions for Windows

Signed-off-by: Tom Wildenhain <tomwi@microsoft.com>

* Created a tutorial for converting models with custom ops. WIP

* Removed long outputs

* Changed to keras syntax and added setup instructions

Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2020-12-10 16:35:28 -05:00
Wenbing Li e8dcc41497
sync onnxruntime header file with 1.6 rc id: a046ef133aa18bda7b7ec9eeedfec4800f452d45 (#34) 2020-12-04 23:37:53 -08:00
Wenbing Li 01152dfd9d
add the MacOS ci pipeline (#33)
* add the MacOS ci pipeline

* update the doc

* add the missing pool

* Update mshost.yaml for Azure Pipelines
2020-12-02 19:32:12 -08:00
Wenbing Li 8002d94deb
Update README.md 2020-12-02 14:11:28 -08:00
Wenbing Li e65d98479b
Update README.md (#30)
* Update README.md

* Update README.md

* Update README.md
2020-12-01 10:51:50 -08:00
Xavier Dupré 559015501e
Add operator SegmentSum (#29)
Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
2020-11-25 21:15:51 -05:00
Faith Xu bb8132d9f8
Update intro text (#28) 2020-11-24 20:45:58 -08:00
Xavier Dupré d31b364297
Fix StringReplace, Replace Replace by GlobalReplace in C++ (#26) 2020-11-17 01:27:23 +01:00
Xavier Dupré 753b141469
Handle empty delimiter in StringSplit, remove unexpected empty strings (#25)
* add operator StringSplit

* handle empty delimiter, remove empty string

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
2020-11-16 14:53:58 -05:00
Xavier Dupré db43f413b8
Add operator StringSplit (#24) 2020-11-16 19:05:12 +01:00
Wenbing Li fadcf2ab89
refine the code a little bit (#23)
* refine the codebase

* remove the dup def file

* surface the build error in the build script.

* fix the build break for gcc

* run the test after build in Linux
2020-11-13 15:06:46 -08:00
Xavier Dupré f713deba41
Add operator StringHashToBucket (#16)
* Add operator StringHashToBucket
* Fix string_to_hash_bucket_fast
2020-11-13 01:46:49 +01:00
Wenbing Li 5a3c6295d8
Enable build on Linux (#19)
* enable build on Linux

* move pssetup files into the subfolder

* clean up

* correct the package path

* enable non-python build

* add Linux CI

* Update mshost.yaml for Azure Pipelines

* change to std::complex
2020-11-12 14:59:00 -08:00
Xavier Dupré 07c970b85e
Add operator StringEqual + broadcast (#20)
* Add python version for StringEqual

* Add operator StringEqual

* add missing files

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
2020-11-12 16:17:50 -05:00
TomWildenhain-Microsoft 1d7bd2b3d2
Tom/casting from py (#17)
* Added getting started instructions for Windows

Signed-off-by: Tom Wildenhain <tomwi@microsoft.com>

* Added casting from python for additional types

* Added error for float16

Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2020-11-11 13:14:27 -05:00
Xavier Dupré 2ef88b0bda
Extend StringJoin to support any dimension (#18) 2020-11-10 18:53:24 +01:00
Xavier Dupré 6927998670
Add operator C++ StringRegexReplace (#14)
* Add StringRegexReplace

* fix the target_link issue.

* undo customops dll test.

* simplify the link name

* remove useless test_main files.

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
Co-authored-by: Wenbing Li <wenbingl@outlook.com>
2020-11-06 15:17:45 -08:00
Wenbing Li 66ddd4a6c9
add re2 library to the source code. (#12) 2020-11-06 13:48:11 +01:00
Xavier Dupré 3ef90b4628
Refactoring, split C++ custom operators into multiple files, add C++ unit test (#8)
* refactoring
* remove useless include
* remove pragma once from cc files
* add custom_op_test.onnx
* remove unnecessary imports, add header in project file, run C++ unit tests
2020-11-03 16:54:06 +01:00
Xavier Dupré e36205ee83
Handles dummy python operators for double and strings (#7)
* refactor tests
* Update mshost.yaml
* Implements dummy operators with double and strings
* udpate CI
* Implements StringUpper C++ version
* Fix runtime issue preventing from registering multiple python ops
* add c++ operator StringJoin
* Support multi output for python and C++ operators
* remove torch in requirements-dev.txt
2020-10-30 11:20:18 +01:00
Wenbing Li 3d13eb867c
add a pytorch custom op example. (#5) 2020-10-23 01:11:11 -07:00