onnxruntime-extensions/base
Wenbing Li d1148aea4e
Support 'added_token' attribute for BPE tokenizer and some code refactoring. (#591)
* Fix CodeGenTokenizer issues and the related code refactoring.

* refactor the trie-tree

* temp check-ins

* code complete

* correctness fixing

* Update _hf_cvt.py

* more test cases fixing

* more refinement

* linux crash fixing

* Update test_autotokenizer.py
2023-11-04 22:56:26 -07:00
..
base64.cc Fix the build breaks the release pipeline and some C++ warnings (#372) 2023-02-28 15:45:32 -08:00
base64.h Fix the build breaks the release pipeline and some C++ warnings (#372) 2023-02-28 15:45:32 -08:00
narrow.h Fix the build breaks the release pipeline and some C++ warnings (#372) 2023-02-28 15:45:32 -08:00
noexcep_operators_placeholder.cc Ensure noexcep_operators and ocos_operators get built always (#570) 2023-10-09 18:29:02 -07:00
ocos.cc Add a bbpe tokenizer decoder for Whisper model (#376) 2023-03-08 15:00:01 -08:00
string_tensor.cc Make kernel Compute method implementations const (#500) 2023-07-28 09:25:36 +10:00
string_tensor.h Remove the deprecating std::codecvt_utf8 from code base. (#541) 2023-08-24 10:26:08 -07:00
string_utils.cc Fix the build breaks the release pipeline and some C++ warnings (#372) 2023-02-28 15:45:32 -08:00
string_utils.h Fix the build breaks the release pipeline and some C++ warnings (#372) 2023-02-28 15:45:32 -08:00
ustring.h Support 'added_token' attribute for BPE tokenizer and some code refactoring. (#591) 2023-11-04 22:56:26 -07:00