onnxruntime-extensions/operators/tokenizer
Wenbing Li d1148aea4e
Support 'added_token' attribute for BPE tokenizer and some code refactoring. (#591)
* Fix CodeGenTokenizer issues and the related code refactoring.

* refactor the trie-tree

* temp check-ins

* code complete

* correctness fixing

* Update _hf_cvt.py

* more test cases fixing

* more refinement

* linux crash fixing

* Update test_autotokenizer.py
2023-11-04 22:56:26 -07:00
..
basic_tokenizer.cc Make kernel Compute method implementations const (#500) 2023-07-28 09:25:36 +10:00
basic_tokenizer.hpp Make kernel Compute method implementations const (#500) 2023-07-28 09:25:36 +10:00
bert_tokenizer.cc Make kernel Compute method implementations const (#500) 2023-07-28 09:25:36 +10:00
bert_tokenizer.hpp Make kernel Compute method implementations const (#500) 2023-07-28 09:25:36 +10:00
bert_tokenizer_decoder.cc Make kernel Compute method implementations const (#500) 2023-07-28 09:25:36 +10:00
bert_tokenizer_decoder.hpp Make kernel Compute method implementations const (#500) 2023-07-28 09:25:36 +10:00
blingfire_sentencebreaker.cc Make kernel Compute method implementations const (#500) 2023-07-28 09:25:36 +10:00
blingfire_sentencebreaker.hpp Make kernel Compute method implementations const (#500) 2023-07-28 09:25:36 +10:00
bpe_decoder.hpp Support 'added_token' attribute for BPE tokenizer and some code refactoring. (#591) 2023-11-04 22:56:26 -07:00
bpe_kernels.cc Support 'added_token' attribute for BPE tokenizer and some code refactoring. (#591) 2023-11-04 22:56:26 -07:00
bpe_kernels.h Support 'added_token' attribute for BPE tokenizer and some code refactoring. (#591) 2023-11-04 22:56:26 -07:00
bpe_tokenizer.hpp Support 'added_token' attribute for BPE tokenizer and some code refactoring. (#591) 2023-11-04 22:56:26 -07:00
bpe_utils.hpp Support 'added_token' attribute for BPE tokenizer and some code refactoring. (#591) 2023-11-04 22:56:26 -07:00
sentencepiece_decoder.hpp Make kernel Compute method implementations const (#500) 2023-07-28 09:25:36 +10:00
sentencepiece_tokenizer.cc Refactor String and Audio operators with status-return prototype. (#576) 2023-10-19 10:40:58 -07:00
sentencepiece_tokenizer.hpp Add token indices output to sentencepiece (#566) 2023-10-03 09:56:28 -07:00
tokenizers.cc Support 'added_token' attribute for BPE tokenizer and some code refactoring. (#591) 2023-11-04 22:56:26 -07:00
trie_tokenizer.hpp Support 'added_token' attribute for BPE tokenizer and some code refactoring. (#591) 2023-11-04 22:56:26 -07:00
trietree.hpp Support 'added_token' attribute for BPE tokenizer and some code refactoring. (#591) 2023-11-04 22:56:26 -07:00
unescape.h Remove the deprecating std::codecvt_utf8 from code base. (#541) 2023-08-24 10:26:08 -07:00
unicode.cc Re-organize the source code folder structure (#88) 2021-05-04 17:12:28 -07:00
unicode.h Re-organize the source code folder structure (#88) 2021-05-04 17:12:28 -07:00
wordpiece_tokenizer.cc Remove the deprecating std::codecvt_utf8 from code base. (#541) 2023-08-24 10:26:08 -07:00
wordpiece_tokenizer.hpp Make kernel Compute method implementations const (#500) 2023-07-28 09:25:36 +10:00