d0aa2c2461
* Fix cache when calling EncodeToIds * Make EnglishRoberta _mergeRanks thread safe * Delete Trainer * Remove the setters on the Bpe properties * Remove Roberta and Tiktoken special casing in the Tokenizer and support the cases in the Model abstraction * Support text-embedding-3-small/large embedding * Remove redundant TokenToId abstraction and keep the one with the extra parameters * Enable creating Tiktoken asynchronously or directly using the tokenizer data * Add cancellationToken support in CreateAsync APIs * Rename sequence to text and Tokenize to Encode * Rename skipSpecialTokens to considerSpecialTokens * Rename TokenizerResult to EncodingResult * Make Token publicly immutable * Change offset tuples from (Index, End) to (Index, Length) * Rename NormalizedString method's parameters * Rename Model's methods to start with verb * Convert Model.GetVocab() method to a Vocab property * Some method's parameters and variable renaming * Remove Vocab and VocabSize from the abstraction * Cleanup normalization support * Minor Bpe cleanup * Resolve rebase change * Address the feedback |
||
---|---|---|
.. | ||
lib.rs.txt | ||
tokens.json | ||
tokens_gpt2.json | ||
tokens_p50k_base.json | ||
tokens_p50k_edit.json | ||
tokens_r50k_base.json |