Граф коммитов

380 Коммитов

Автор SHA1 Сообщение Дата
Changming Sun c30df08aee
Add OneBranch.Official.yml (#242) 2022-06-06 10:25:44 -07:00
Changming Sun 70bfe4345c
Create onebranch pipeline for win32-wheel (#240)
For compliance work.
2022-06-03 18:26:44 -07:00
Changming Sun 36e5c474fd
Update requirements.txt to restrict protobuf version (#241) 2022-06-03 16:09:25 -07:00
Wenbing Li da4784a2cc
update the bert end to end example with hftok (#236) 2022-06-01 10:41:42 -07:00
shaahji 49548f843d Issue #230: Add HuggingFace vocab format to Bert tokenizer
HuggingFace vocab format is newline separated (unlike GPT which is
json). Newline separated is likely to be faster and doesn't require
an external library to parse it. Instead of introducing a json based
format, added support for native HuggingFace newline separated token
format.
2022-05-26 14:17:20 -07:00
Lucas Gomez Jimenez 21dc637a6d
Ensure custom op domain is released (#234) 2022-05-25 16:51:49 -07:00
Wenbing Li 471a858b56
add a patch opencv compiling (#233) 2022-05-19 10:08:47 -07:00
Nat Kershaw (MSFT) 1678c0be9a
Update README.md (#229) 2022-05-13 15:34:20 -07:00
Wenbing Li 80c26772af
Update wheels_win32.yml for Azure Pipelines (#228) 2022-05-12 10:26:07 -07:00
Wenbing Li 909acb7ce4
build and packaging script improvement for release (#218)
* integrate opencv

* small fixing

* Add the opencv includes and libs

* refine a little bit

* standardize the output folder.

* fix ctest on Linux

* fix setup.py on output folder change.

* more fixings for CI pipeline

* more fixing 1

* more fixing 2

* more fixing 3

* ci pipeline fixing 1

* ci pipeline fixing 2

* a silly typo...

* ci pipeline fixing 3

* fixing the file copy issue.

* last fixing.

* re-test the fullpath in build_ext.

* One more try

* extent timeout

* mshost.yml indent

* Update mshost.yaml for Azure Pipelines

* cibuild build python versions

* Update wheels.yml

* only build python 3.8/3.9

* Update wheels.yml for Azure Pipelines

* seperate the ci pipeline
2022-05-11 16:51:59 -07:00
Wenbing Li 972552126f
Update README.md 2022-05-04 15:33:09 -07:00
Wenbing Li bfbfa5a304
An end-to-end BERT model with pre-/post- processing. (#224)
* bert demo

* add some comments

* support multiple outputs in ONNX model

* code polishing

* encoding issue on Windows platform.
2022-04-20 16:14:46 -07:00
Wenbing Li fb378b72a0
Update ONNX version information in extensions (#222)
* upgrade the onnx versions

* onnx model op version
2022-04-13 10:02:25 -07:00
Wenbing Li bcb41fcf0e
Update README.md 2022-03-29 15:49:02 -07:00
Wenbing Li 63076479a0
gpt2 end-end pre-processing script fixing with pytorch 1.11. (#215)
* gpt2 end-end pre-processing script fixing with pytorch 1.11.

* the fixings for CI failures.

* new dependency for gpt2bs.py

* remove the gpt2bs ci since it started failing in ORT 1.11
2022-03-29 14:37:50 -07:00
Wenbing Li 98fb96d4e8
updat the tutorial (#214) 2022-03-23 14:28:18 -07:00
Qing 59d39123c7
fix typo (#212) 2022-03-22 14:31:37 -07:00
Wenbing Li 6eb41afcb2
fixing the security warning on blingfire. (#209) 2022-03-21 09:44:30 -07:00
TruscaPetre 6b3a97202b
Update README.md (#210)
Correcting an English syntactic mistake.
2022-03-20 17:09:48 -07:00
Wenbing Li d09f988acd
fixing test on PyTorch 1.11 release (#208)
* fixing test on Pytorch 1.11 release

* fix the type error in the PyTorch 1.11

* set a fixed int64 data type for the test
2022-03-15 19:27:33 -07:00
Wenbing Li febe63a5f0
Update README.md 2022-03-09 17:35:46 -08:00
Wenbing Li 9d5ce81ab9
update the docs for new API (#207) 2022-03-08 19:14:13 -08:00
Wenbing Li 4bb3a22c45
The new pre/post processing API, replacing ONNXCompose (#205)
* traced processing module

* before debugging.

* updates

* temporary

* the trace mode pass

* code adjusting for ci pipeline.

* only torch 1.11 support prim:pythonop

* extending sequence processing module.
2022-03-08 16:32:59 -08:00
Wenbing Li 2e2ee11772
still keeps visual studio version sticking to 2019 (#206) 2022-03-07 16:36:52 -08:00
Wenbing Li b99e0c4138
Test onnx 1.11 package on MacOS (#202) 2022-02-23 10:13:00 -08:00
shaahji d14abe1461
Identified bug in SentencePieceTokenizer where encoding was (#200)
not restricted to specific token. Sentencepiece.Encode itself doesn't
clear the input vector before populating the result for the input
token.

Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2022-02-22 10:52:28 -08:00
Wenbing Li bd70ea153d
Temporarily disable the onnx 1.11
onnx 1.11 is failing on MacOS CI pipeline.
2022-02-17 14:29:47 -08:00
Wenbing Li 5bf46560a6
Update gpt2bs.py 2022-02-10 11:07:18 -08:00
Wenbing Li 8eb89cdd8b
clean up the unused files (#199)
* clean up the unused files

* one more files

* update the doc as well
2022-02-09 15:18:13 -08:00
Wenbing Li fd044872b2 cp ci pipeline 2022-02-09 10:28:45 -08:00
Wenbing Li d0ff193eec
support the sequence tensor in pnp.ProcessingModule. (#197)
* support the sequence tensor in ProcessingModule.

* version check

* version check 2
2022-02-04 14:48:47 -08:00
Wenbing Li 459c4f7d61
Update pytorch_custom_ops_tutorial.ipynb (#196)
* Update pytorch_custom_ops_tutorial.ipynb

* Update mshost.yaml

* tidy up
2022-02-02 10:34:20 -08:00
Wenbing Li f418557d85
Update mshost.yaml (#195)
* Update mshost.yaml

* fix the build error for Android.

* add the spm tokenizer in the selected ops list
2022-01-24 10:57:18 -08:00
Wenbing Li a074038642
support custom function in the script mode. (#194) 2022-01-20 10:24:01 -08:00
Wenbing Li 0374ab0f1b
support the model joining with extra inputs. (#192) 2022-01-14 10:43:03 -08:00
Wenbing Li cc858e831b
Update setup.cfg 2022-01-07 16:41:37 -08:00
Wenbing Li 03abb5130a
Refine the ONNXCompose implementation (#191)
* a fixing for the sequence tensors

* Refine the ONNXCompose implementation.
2022-01-05 16:50:45 -08:00
joburkho a9737505ca
Joburkho/change to const char star (#187)
* Correct memory reservation.

* Change output_sentences to vector of const char*.
2021-12-01 00:01:52 -08:00
Wenbing Li d079ba4bed
Update mshost.yaml for Azure Pipelines (#184)
* Update mshost.yaml for Azure Pipelines

* Update mshost.yaml for Azure Pipelines

* hotfix for ort beam search step
2021-11-29 13:22:23 -08:00
Wenbing Li 0b2cfe7dd7
update the docs for imagenet pnp (#186) 2021-11-29 10:53:15 -08:00
Wenbing Li 9bd453e0f1
initial checkins for onnxcompose (#185)
* initial checkins for onnxcompose

* update ci pipeline for the test.

* add the missing quotes

* Switch to looseVersion for torch version.

* testif

* padding_length

* skip the gpt2

* add onnxruntime 1.9 test package.

* fix a memory bug on pyop.
2021-11-22 21:02:39 -08:00
Mojimi dddd85397d
Add android pipeline (#183)
* build locally success

* update

* fix pipeline

* fix pipeline

* fix pipeline

* fix tool chain file

* validate quickly

* fix tool chain

* update

* finished

* fix bug

* fix bugs

* remove tree

* update android ndk

* fix bugs

* remove java install

* bring back build

* fix model

* resolve conflict

* remove uncessary file

* remove tensorflow_text version 2.6.0

Co-authored-by: Ze Tao <zetao@microsoft.com>
2021-11-09 10:25:03 -08:00
Mojimi abdd5b1bd8
Add MaskedFill (#182)
* add StringRemove

* update

Co-authored-by: Ze Tao <zetao@microsoft.com>
2021-11-04 09:29:17 +08:00
Mojimi b5a8a1abd9
add test (#180)
Co-authored-by: Ze Tao <zetao@microsoft.com>
2021-11-01 10:08:09 +08:00
Mojimi 537c492219
Add tests for mapping-related operators (#179)
* init

* finish vector_to_string

* add more test

Co-authored-by: Ze Tao <zetao@microsoft.com>
2021-10-29 09:00:06 +08:00
Mojimi c1e9fdcb08
Add test for segment extraction (#177)
* add test for ECMARegex

* add empty input test case

* fix bug

* update

Co-authored-by: Ze Tao <zetao@microsoft.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2021-10-28 09:14:47 +08:00
Zuwei Zhao 05f7ded825
Add check for empty input in StringJoin operator and fix empty string input error in BlingFire sentence breaker. (#175)
* Add test cases and fix empty string error in BlingFire sentence breaker.

* Throw error if input text to join is empty array.

* Fix scalar support and access violation.

* Resolve comments.

* Resolve comments.

Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
2021-10-27 20:21:16 +08:00
Mojimi 64c972fb02
Add test for StringECMARegexReplace (#176)
* add test for ECMARegex

* add empty input test case

Co-authored-by: Ze Tao <zetao@microsoft.com>
2021-10-26 10:28:06 +08:00
Mojimi 46d096f1af
Fix ::tolower error when locale is not 'C' (#174)
* add test and implement tolower

* fix locale

* fix locale

Co-authored-by: Ze Tao <zetao@microsoft.com>
2021-10-20 20:59:29 -07:00
Mojimi 448518534c
Add native test for bert tokenizer (#173)
* add native test for bert tokenizer

* add python test

* fix unicode category

Co-authored-by: Ze Tao <zetao@microsoft.com>
2021-10-19 11:09:38 -07:00