* Adding needed Tokenizer's APIs
* Address the feedback
* Small update to the newly exposed APIs
* fix comments
* Update the APIs signatures
* More feedback addressing
* Fix the comments
* Packaging cleanup
Originally I was just trying to remove mentions of snupkg, but then
things got a bit carried away. :)
This is trying to remove as much duplication and dead code related to
packaging that I can.
* Apply code review feedback
* Suppress copying indirect references
* Remove unwanted bundled files from AutoML
* Remove leading slash
* Refactor model download
* Correct the packaging path of native symbols
* Rename NoTargets projects from csproj to proj
* Fix build issues around model download and respond to feedback
* Remove NoTargets file extension enforcement
* Rename proj to CSProj, include in SLN
I'd like to ensure all our projects are included in the SLN and don't
rely on separate build steps.
VS prefers *.csproj in the sln so I renamed things back to csproj.
* Respond to PR feedback
* Fix cache when calling EncodeToIds
* Make EnglishRoberta _mergeRanks thread safe
* Delete Trainer
* Remove the setters on the Bpe properties
* Remove Roberta and Tiktoken special casing in the Tokenizer and support the cases in the Model abstraction
* Support text-embedding-3-small/large embedding
* Remove redundant TokenToId abstraction and keep the one with the extra parameters
* Enable creating Tiktoken asynchronously or directly using the tokenizer data
* Add cancellationToken support in CreateAsync APIs
* Rename sequence to text and Tokenize to Encode
* Rename skipSpecialTokens to considerSpecialTokens
* Rename TokenizerResult to EncodingResult
* Make Token publicly immutable
* Change offset tuples from (Index, End) to (Index, Length)
* Rename NormalizedString method's parameters
* Rename Model's methods to start with verb
* Convert Model.GetVocab() method to a Vocab property
* Some method's parameters and variable renaming
* Remove Vocab and VocabSize from the abstraction
* Cleanup normalization support
* Minor Bpe cleanup
* Resolve rebase change
* Address the feedback