Add note to resize token embeddings matrix when adding new tokens to voc (#10331)

This commit is contained in:
Lysandre Debut 2021-02-22 15:48:20 +01:00 коммит произвёл GitHub
Родитель 19e737b93e
Коммит e73a3e1891
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
1 изменённых файлов: 6 добавлений и 0 удалений

Просмотреть файл

@ -971,6 +971,12 @@ class SpecialTokensMixin:
Add a list of new tokens to the tokenizer class. If the new tokens are not in the vocabulary, they are added to
it with indices starting from length of the current vocabulary.
.. Note::
When adding new tokens to the vocabulary, you should make sure to also resize the token embedding matrix of
the model so that its embedding matrix matches the tokenizer.
In order to do that, please use the :meth:`~transformers.PreTrainedModel.resize_token_embeddings` method.
Args:
new_tokens (:obj:`str`, :obj:`tokenizers.AddedToken` or a list of `str` or :obj:`tokenizers.AddedToken`):
Tokens are only added if they are not already in the vocabulary. :obj:`tokenizers.AddedToken` wraps a