huggingface-transformers/README.md

<p align="center">
    <br>
    <img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/transformers_logo_name.png" width="400"/>
    <br>
<p>
<p align="center">
    <a href="https://circleci.com/gh/huggingface/transformers">
        <img alt="Build" src="https://img.shields.io/circleci/build/github/huggingface/transformers/master">
    </a>
    <a href="https://github.com/huggingface/transformers/blob/master/LICENSE">
        <img alt="GitHub" src="https://img.shields.io/github/license/huggingface/transformers.svg?color=blue">
    </a>
    <a href="https://huggingface.co/transformers/index.html">
        <img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/transformers/index.html.svg?down_color=red&down_message=offline&up_message=online">
    </a>
    <a href="https://github.com/huggingface/transformers/releases">
        <img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/transformers.svg">
    </a>
</p>

<h3 align="center">
<p>State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0
</h3>

🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides state-of-the-art general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, T5, CTRL...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over thousands of pretrained models in 100+ languages and deep interoperability between PyTorch & TensorFlow 2.0.

### Recent contributors
[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/0)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/0)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/1)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/1)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/2)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/2)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/3)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/3)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/4)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/4)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/5)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/5)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/6)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/6)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/7)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/7)

### Features
- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners

State-of-the-art NLP for everyone
- Deep learning researchers
- Hands-on practitioners
- AI/ML/NLP teachers and educators

Lower compute costs, smaller carbon footprint
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- Dozens of architectures with over 1,000 pretrained models, some in more than 100 languages

Choose the right framework for every part of a model's lifetime
- Train state-of-the-art models in 3 lines of code
- Deep interoperability between TensorFlow 2.0 and PyTorch models
- Move a single model between TF2.0/PyTorch frameworks at will
- Seamlessly pick the right framework for training, evaluation, production


| Section | Description |
|-|-|
| [Installation](#installation) | How to install the package |
| [Model architectures](#model-architectures) | Architectures (with pretrained weights) |
| [Online demo](#online-demo) | Experimenting with this repo’s text generation capabilities |
| [Quick tour: Usage](#quick-tour) | Tokenizers & models usage: Bert and GPT-2 |
| [Quick tour: TF 2.0 and PyTorch ](#Quick-tour-TF-20-training-and-PyTorch-interoperability) | Train a TF 2.0 model in 10 lines of code, load it in PyTorch |
| [Quick tour: pipelines](#quick-tour-of-pipelines) | Using Pipelines: Wrapper around tokenizer and models to use finetuned models |
| [Quick tour: Fine-tuning/usage scripts](#quick-tour-of-the-fine-tuningusage-scripts) | Using provided scripts: GLUE, SQuAD and Text generation |
| [Quick tour: Share your models ](#Quick-tour-of-model-sharing) | Upload and share your fine-tuned models with the community |
| [Migrating from pytorch-transformers to transformers](#Migrating-from-pytorch-transformers-to-transformers) | Migrating your code from pytorch-transformers to transformers |
| [Migrating from pytorch-pretrained-bert to pytorch-transformers](#Migrating-from-pytorch-pretrained-bert-to-transformers) | Migrating your code from pytorch-pretrained-bert to transformers |
| [Documentation](https://huggingface.co/transformers/) | Full API documentation and more |

## Installation

This repo is tested on Python 3.6+, PyTorch 1.0.0+ (PyTorch 1.3.1+ for examples) and TensorFlow 2.0.

You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).

Create a virtual environment with the version of Python you're going to use and activate it.

Now, if you want to use 🤗 Transformers, you can install it with pip. If you'd like to play with the examples, you must install it from source.

### With pip

First you need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available) and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.

When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:

```bash
pip install transformers
```

### From source

Here also, you first need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available) and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.

When TensorFlow 2.0 and/or PyTorch has been installed, you can install from source by cloning the repository and running:

```bash
git clone https://github.com/huggingface/transformers
cd transformers
pip install .
```

When you update the repository, you should upgrade the transformers installation and its dependencies as follows:

```bash
git pull
pip install --upgrade .
```

### Run the examples

Examples are included in the repository but are not shipped with the library.

Therefore, in order to run the latest versions of the examples, you need to install from source, as described above.

Look at the [README](https://github.com/huggingface/transformers/blob/master/examples/README.md) for how to run examples.

### Tests

A series of tests are included for the library and for some example scripts. Library tests can be found in the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the [examples folder](https://github.com/huggingface/transformers/tree/master/examples).

Depending on which framework is installed (TensorFlow 2.0 and/or PyTorch), the irrelevant tests will be skipped. Ensure that both frameworks are installed if you want to execute all tests.

Here's the easiest way to run tests for the library:

```bash
pip install -e ".[testing]"
make test
```

and for the examples:

```bash
pip install -e ".[testing]"
pip install -r examples/requirements.txt
make test-examples
```

For details, refer to the [contributing guide](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#tests).

### Do you want to run a Transformer model on a mobile device?

You should check out our [`swift-coreml-transformers`](https://github.com/huggingface/swift-coreml-transformers) repo.

It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`, `DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices.

At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models to productizing them in CoreML, or prototype a model or an app in CoreML then research its hyperparameters or architecture from TensorFlow 2.0 and/or PyTorch. Super exciting!

## Model architectures

🤗 Transformers currently provides the following NLU/NLG architectures:

1. **[BERT](https://huggingface.co/transformers/model_doc/bert.html)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
2. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
3. **[GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
4. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
5. **[XLNet](https://huggingface.co/transformers/model_doc/xlnet.html)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
6. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
7. **[RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
8. **[DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
9. **[CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
10. **[CamemBERT](https://huggingface.co/transformers/model_doc/camembert.html)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
11. **[ALBERT](https://huggingface.co/transformers/model_doc/albert.html)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
12. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
13. **[XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
14. **[MMBT](https://github.com/facebookresearch/mmbt/)** (from Facebook), released together with the paper a [Supervised Multimodal Bitransformers for Classifying Images and Text](https://arxiv.org/pdf/1909.02950.pdf) by Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, Davide Testuggine.
15. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
16. **[BART](https://huggingface.co/transformers/model_doc/bart.html)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
17. **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
18. **[DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
19. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
20. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
21. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
22. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
23. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.

These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Pearson R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).

## Online demo

You can test our inference API on most model pages from the model hub: https://huggingface.co/models

For example: 
- [Masked word completion with BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France)
- [NER with Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city)
- [Text generation with GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C+)
- [NLI with RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal)
- [Summarization with BART](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct)
- [Question answering with DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
- [Translation with T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin)


**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team at transformer.huggingface.co, is the official demo of this repo’s text generation capabilities.

## Quick tour

Let's do a very quick overview of the model architectures in 🤗 Transformers. Detailed examples for each model architecture (Bert, GPT, GPT-2, Transformer-XL, XLNet and XLM) can be found in the [full documentation](https://huggingface.co/transformers/).

```python
import torch
from transformers import *

# Transformers has a unified API
# for 10 transformer architectures and 30 pretrained weights.
#          Model          | Tokenizer          | Pretrained weights shortcut
MODELS = [(BertModel,       BertTokenizer,       'bert-base-uncased'),
          (OpenAIGPTModel,  OpenAIGPTTokenizer,  'openai-gpt'),
          (GPT2Model,       GPT2Tokenizer,       'gpt2'),
          (CTRLModel,       CTRLTokenizer,       'ctrl'),
          (TransfoXLModel,  TransfoXLTokenizer,  'transfo-xl-wt103'),
          (XLNetModel,      XLNetTokenizer,      'xlnet-base-cased'),
          (XLMModel,        XLMTokenizer,        'xlm-mlm-enfr-1024'),
          (DistilBertModel, DistilBertTokenizer, 'distilbert-base-cased'),
          (RobertaModel,    RobertaTokenizer,    'roberta-base'),
          (XLMRobertaModel, XLMRobertaTokenizer, 'xlm-roberta-base'),
         ]

# To use TensorFlow 2.0 versions of the models, simply prefix the class names with 'TF', e.g. `TFRobertaModel` is the TF 2.0 counterpart of the PyTorch model `RobertaModel`

# Let's encode some text in a sequence of hidden-states using each model:
for model_class, tokenizer_class, pretrained_weights in MODELS:
    # Load pretrained model/tokenizer
    tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
    model = model_class.from_pretrained(pretrained_weights)

    # Encode text
    input_ids = torch.tensor([tokenizer.encode("Here is some text to encode", add_special_tokens=True)])  # Add special tokens takes care of adding [CLS], [SEP], <s>... tokens in the right way for each model.
    with torch.no_grad():
        last_hidden_states = model(input_ids)[0]  # Models outputs are now tuples

# Each architecture is provided with several class for fine-tuning on down-stream tasks, e.g.
BERT_MODEL_CLASSES = [BertModel, BertForPreTraining, BertForMaskedLM, BertForNextSentencePrediction,
                      BertForSequenceClassification, BertForTokenClassification, BertForQuestionAnswering]

# All the classes for an architecture can be initiated from pretrained weights for this architecture
# Note that additional weights added for fine-tuning are only initialized
# and need to be trained on the down-stream task
pretrained_weights = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(pretrained_weights)
for model_class in BERT_MODEL_CLASSES:
    # Load pretrained model/tokenizer
    model = model_class.from_pretrained(pretrained_weights)

    # Models can return full list of hidden-states & attentions weights at each layer
    model = model_class.from_pretrained(pretrained_weights,
                                        output_hidden_states=True,
                                        output_attentions=True)
    input_ids = torch.tensor([tokenizer.encode("Let's see all hidden-states and attentions on this text")])
    all_hidden_states, all_attentions = model(input_ids)[-2:]

    # Models are compatible with Torchscript
    model = model_class.from_pretrained(pretrained_weights, torchscript=True)
    traced_model = torch.jit.trace(model, (input_ids,))

    # Simple serialization for models and tokenizers
    model.save_pretrained('./directory/to/save/')  # save
    model = model_class.from_pretrained('./directory/to/save/')  # re-load
    tokenizer.save_pretrained('./directory/to/save/')  # save
    tokenizer = BertTokenizer.from_pretrained('./directory/to/save/')  # re-load

    # SOTA examples for GLUE, SQUAD, text generation...
```

## Quick tour TF 2.0 training and PyTorch interoperability

Let's do a quick example of how a TensorFlow 2.0 model can be trained in 12 lines of code with 🤗 Transformers and then loaded in PyTorch for fast inspection/tests.

```python
import tensorflow as tf
import tensorflow_datasets
from transformers import *

# Load dataset, tokenizer, model from pretrained model/vocabulary
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')
data = tensorflow_datasets.load('glue/mrpc')

# Prepare dataset for GLUE as a tf.data.Dataset instance
train_dataset = glue_convert_examples_to_features(data['train'], tokenizer, max_length=128, task='mrpc')
valid_dataset = glue_convert_examples_to_features(data['validation'], tokenizer, max_length=128, task='mrpc')
train_dataset = train_dataset.shuffle(100).batch(32).repeat(2)
valid_dataset = valid_dataset.batch(64)

# Prepare training: Compile tf.keras model with optimizer, loss and learning rate schedule
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])

# Train and evaluate using tf.keras.Model.fit()
history = model.fit(train_dataset, epochs=2, steps_per_epoch=115,
                    validation_data=valid_dataset, validation_steps=7)

# Load the TensorFlow model in PyTorch for inspection
model.save_pretrained('./save/')
pytorch_model = BertForSequenceClassification.from_pretrained('./save/', from_tf=True)

# Quickly test a few predictions - MRPC is a paraphrasing task, let's see if our model learned the task
sentence_0 = "This research was consistent with his findings."
sentence_1 = "His findings were compatible with this research."
sentence_2 = "His findings were not compatible with this research."
inputs_1 = tokenizer(sentence_0, sentence_1, add_special_tokens=True, return_tensors='pt')
inputs_2 = tokenizer(sentence_0, sentence_2, add_special_tokens=True, return_tensors='pt')

pred_1 = pytorch_model(inputs_1['input_ids'], token_type_ids=inputs_1['token_type_ids'])[0].argmax().item()
pred_2 = pytorch_model(inputs_2['input_ids'], token_type_ids=inputs_2['token_type_ids'])[0].argmax().item()

print("sentence_1 is", "a paraphrase" if pred_1 else "not a paraphrase", "of sentence_0")
print("sentence_2 is", "a paraphrase" if pred_2 else "not a paraphrase", "of sentence_0")
```

## Quick tour of the fine-tuning/usage scripts

**Important**
Before running the fine-tuning scripts, please read the
[instructions](#run-the-examples) on how to
setup your environment to run the examples.

The library comprises several example scripts with SOTA performances for NLU and NLG tasks:

- `run_glue.py`: an example fine-tuning sequence classification models on nine different GLUE tasks (*sequence-level classification*)
- `run_squad.py`: an example fine-tuning question answering models on the question answering dataset SQuAD 2.0 (*token-level classification*)
- `run_ner.py`: an example fine-tuning token classification models on named entity recognition (*token-level classification*)
- `run_generation.py`: an example using GPT, GPT-2, CTRL, Transformer-XL and XLNet for conditional language generation
- other model-specific examples (see the documentation).

Here are three quick usage examples for these scripts:

### `run_glue.py`: Fine-tuning on GLUE tasks for sequence classification

The [General Language Understanding Evaluation (GLUE) benchmark](https://gluebenchmark.com/) is a collection of nine sentence- or sentence-pair language understanding tasks for evaluating and analyzing natural language understanding systems.

Before running any of these GLUE tasks you should download the
[GLUE data](https://gluebenchmark.com/tasks) by running
[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
and unpack it to some directory `$GLUE_DIR`.

You should also install the additional packages required by the examples:

```shell
pip install -r ./examples/requirements.txt
```

```shell
export GLUE_DIR=/path/to/glue
export TASK_NAME=MRPC

python ./examples/text-classification/run_glue.py \
    --model_name_or_path bert-base-uncased \
    --task_name $TASK_NAME \
    --do_train \
    --do_eval \
    --data_dir $GLUE_DIR/$TASK_NAME \
    --max_seq_length 128 \
    --per_device_eval_batch_size=8   \
    --per_device_train_batch_size=8   \
    --learning_rate 2e-5 \
    --num_train_epochs 3.0 \
    --output_dir /tmp/$TASK_NAME/
```

where task name can be one of CoLA, SST-2, MRPC, STS-B, QQP, MNLI, QNLI, RTE, WNLI.

The dev set results will be present within the text file 'eval_results.txt' in the specified output_dir. In case of MNLI, since there are two separate dev sets, matched and mismatched, there will be a separate output folder called '/tmp/MNLI-MM/' in addition to '/tmp/MNLI/'.

#### Fine-tuning XLNet model on the STS-B regression task

This example code fine-tunes XLNet on the STS-B corpus using parallel training on a server with 4 V100 GPUs.
Parallel training is a simple way to use several GPUs (but is slower and less flexible than distributed training, see below).

```shell
export GLUE_DIR=/path/to/glue

python ./examples/text-classification/run_glue.py \
    --model_name_or_path xlnet-large-cased \
    --do_train  \
    --do_eval   \
    --task_name=sts-b     \
    --data_dir=${GLUE_DIR}/STS-B  \
    --output_dir=./proc_data/sts-b-110   \
    --max_seq_length=128   \
    --per_device_eval_batch_size=8   \
    --per_device_train_batch_size=8   \
    --gradient_accumulation_steps=1 \
    --max_steps=1200  \
    --model_name=xlnet-large-cased   \
    --overwrite_output_dir   \
    --overwrite_cache \
    --warmup_steps=120
```

On this machine we thus have a batch size of 32, please increase `gradient_accumulation_steps` to reach the same batch size if you have a smaller machine. These hyper-parameters should result in a Pearson correlation coefficient of `+0.917` on the development set.

#### Fine-tuning Bert model on the MRPC classification task

This example code fine-tunes the Bert Whole Word Masking model on the Microsoft Research Paraphrase Corpus (MRPC) corpus using distributed training on 8 V100 GPUs to reach a F1 > 92.

```bash
python -m torch.distributed.launch --nproc_per_node 8 ./examples/text-classification/run_glue.py   \
    --model_name_or_path bert-large-uncased-whole-word-masking \
    --task_name MRPC \
    --do_train   \
    --do_eval   \
    --data_dir $GLUE_DIR/MRPC/   \
    --max_seq_length 128   \
    --per_device_eval_batch_size=8   \
    --per_device_train_batch_size=8   \
    --learning_rate 2e-5   \
    --num_train_epochs 3.0  \
    --output_dir /tmp/mrpc_output/ \
    --overwrite_output_dir   \
    --overwrite_cache \
```

Training with these hyper-parameters gave us the following results:

```bash
  acc = 0.8823529411764706
  acc_and_f1 = 0.901702786377709
  eval_loss = 0.3418912578906332
  f1 = 0.9210526315789473
  global_step = 174
  loss = 0.07231863956341798
```

### `run_squad.py`: Fine-tuning on SQuAD for question-answering

This example code fine-tunes BERT on the SQuAD dataset using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD:

```bash
python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_squad.py \
    --model_type bert \
    --model_name_or_path bert-large-uncased-whole-word-masking \
    --do_train \
    --do_eval \
    --train_file $SQUAD_DIR/train-v1.1.json \
    --predict_file $SQUAD_DIR/dev-v1.1.json \
    --learning_rate 3e-5 \
    --num_train_epochs 2 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --output_dir ../models/wwm_uncased_finetuned_squad/ \
    --per_device_eval_batch_size=3   \
    --per_device_train_batch_size=3   \
```

Training with these hyper-parameters gave us the following results:

```bash
python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ../models/wwm_uncased_finetuned_squad/predictions.json
{"exact_match": 86.91579943235573, "f1": 93.1532499015869}
```

This is the model provided as `bert-large-uncased-whole-word-masking-finetuned-squad`.

### `run_generation.py`: Text generation with GPT, GPT-2, CTRL, Transformer-XL and XLNet

A conditional generation script is also included to generate text from a prompt.
The generation script includes the [tricks](https://github.com/rusiaaman/XLNet-gen#methodology) proposed by Aman Rusia to get high-quality generation with memory models like Transformer-XL and XLNet (include a predefined text to make short inputs longer).

Here is how to run the script with the small version of OpenAI GPT-2 model:

```shell
python ./examples/text-generation/run_generation.py \
    --model_type=gpt2 \
    --length=20 \
    --model_name_or_path=gpt2 \
```

and from the Salesforce CTRL model:
```shell
python ./examples/text-generation/run_generation.py \
    --model_type=ctrl \
    --length=20 \
    --model_name_or_path=ctrl \
    --temperature=0 \
    --repetition_penalty=1.2 \
```

## Quick tour of model sharing

Starting with `v2.2.2`, you can now upload and share your fine-tuned models with the community, using the <abbr title="Command-line interface">CLI</abbr> that's built-in to the library.

**First, create an account on [https://huggingface.co/join](https://huggingface.co/join)**. Optionally, join an existing organization or create a new one. Then:

```shell
transformers-cli login
# log in using the same credentials as on huggingface.co
```
Upload your model:
```shell
transformers-cli upload ./path/to/pretrained_model/

# ^^ Upload folder containing weights/tokenizer/config
# saved via `.save_pretrained()`

transformers-cli upload ./config.json [--filename folder/foobar.json]

# ^^ Upload a single file
# (you can optionally override its filename, which can be nested inside a folder)
```

If you want your model to be namespaced by your organization name rather than your username, add the following flag to any command:
```shell
--organization organization_name
```

Your model will then be accessible through its identifier, a concatenation of your username (or organization name) and the folder name above:
```python
"username/pretrained_model"
# or if an org:
"organization_name/pretrained_model"
```

**Please add a README.md model card** to the repo under `model_cards/` with: model description, training params (dataset, preprocessing, hardware used, hyperparameters), evaluation results, intended uses & limitations, etc.

Your model now has a page on huggingface.co/models 🔥

Anyone can load it from code:
```python
tokenizer = AutoTokenizer.from_pretrained("namespace/pretrained_model")
model = AutoModel.from_pretrained("namespace/pretrained_model")
```

List all your files on S3:
```shell
transformers-cli s3 ls
```

You can also delete unneeded files:

```shell
transformers-cli s3 rm …
```

## Quick tour of pipelines

New in version `v2.3`: `Pipeline` are high-level objects which automatically handle tokenization, running your data through a transformers model
and outputting the result in a structured object.

You can create `Pipeline` objects for the following down-stream tasks:

 - `feature-extraction`: Generates a tensor representation for the input sequence
 - `ner`: Generates named entity mapping for each word in the input sequence.
 - `sentiment-analysis`: Gives the polarity (positive / negative) of the whole input sequence.
 - `text-classification`: Initialize a `TextClassificationPipeline` directly, or see `sentiment-analysis` for an example.
 - `question-answering`: Provided some context and a question refering to the context, it will extract the answer to the question in the context.
 - `fill-mask`: Takes an input sequence containing a masked token (e.g. `<mask>`) and return list of most probable filled sequences, with their probabilities.
 - `summarization`
 - `translation_xx_to_yy`

```python
>>> from transformers import pipeline

# Allocate a pipeline for sentiment-analysis
>>> nlp = pipeline('sentiment-analysis')
>>> nlp('We are very happy to include pipeline into the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9978193640708923}]

# Allocate a pipeline for question-answering
>>> nlp = pipeline('question-answering')
>>> nlp({
...     'question': 'What is the name of the repository ?',
...     'context': 'Pipeline have been included in the huggingface/transformers repository'
... })
{'score': 0.5135612454720828, 'start': 35, 'end': 59, 'answer': 'huggingface/transformers'}

```

## Migrating from pytorch-transformers to transformers

Here is a quick summary of what you should take care of when migrating from `pytorch-transformers` to `transformers`.

### Positional order of some models' keywords inputs (`attention_mask`, `token_type_ids`...) changed

To be able to use Torchscript (see #1010, #1204 and #1195) the specific order of some models **keywords inputs** (`attention_mask`, `token_type_ids`...) has been changed.

If you used to call the models with keyword names for keyword arguments, e.g. `model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)`, this should not cause any change.

If you used to call the models with positional inputs for keyword arguments, e.g. `model(inputs_ids, attention_mask, token_type_ids)`, you may have to double check the exact order of input arguments.


## Migrating from pytorch-pretrained-bert to transformers

Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `transformers`.

### Models always output `tuples`

The main breaking change when migrating from `pytorch-pretrained-bert` to `transformers` is that every model's forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.

The exact content of the tuples for each model is detailed in the models' docstrings and the [documentation](https://huggingface.co/transformers/).

In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.

Here is a `pytorch-pretrained-bert` to `transformers` conversion example for a `BertForSequenceClassification` classification model:

```python
# Let's load our model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

# If you used to have this line in pytorch-pretrained-bert:
loss = model(input_ids, labels=labels)

# Now just use this line in transformers to extract the loss from the output tuple:
outputs = model(input_ids, labels=labels)
loss = outputs[0]

# In transformers you can also have access to the logits:
loss, logits = outputs[:2]

# And even the attention weights if you configure the model to output them (and other outputs too, see the docstrings and documentation)
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', output_attentions=True)
outputs = model(input_ids, labels=labels)
loss, logits, attentions = outputs
```

### Using hidden states

By enabling the configuration option `output_hidden_states`, it was possible to retrieve the last hidden states of the encoder. In `pytorch-transformers` as well as `transformers` the return value has changed slightly: `all_hidden_states` now also includes the hidden state of the embeddings in addition to those of the encoding layers. This allows users to easily access the embeddings final state.

### Serialization

Breaking change in the `from_pretrained()` method:

1. Models are now set in evaluation mode by default when instantiated with the `from_pretrained()` method. To train them, don't forget to set them back in training mode (`model.train()`) to activate the dropout modules.

2. The additional `*input` and `**kwargs` arguments supplied to the `from_pretrained()` method used to be directly passed to the underlying model's class `__init__()` method. They are now used to update the model configuration attribute instead, which can break derived model classes built based on the previous `BertForSequenceClassification` examples. We are working on a way to mitigate this breaking change in [#866](https://github.com/huggingface/transformers/pull/866) by forwarding the the model's `__init__()` method (i) the provided positional arguments and (ii) the keyword arguments which do not match any configuration class attributes.

Also, while not a breaking change, the serialization methods have been standardized and you probably should switch to the new method `save_pretrained(save_directory)` if you were using any other serialization method before.

Here is an example:

```python
### Let's load a model and tokenizer
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

### Do some stuff to our model and tokenizer
# Ex: add new tokens to the vocabulary and embeddings of our model
tokenizer.add_tokens(['[SPECIAL_TOKEN_1]', '[SPECIAL_TOKEN_2]'])
model.resize_token_embeddings(len(tokenizer))
# Train our model
train(model)

### Now let's save our model and tokenizer to a directory
model.save_pretrained('./my_saved_model_directory/')
tokenizer.save_pretrained('./my_saved_model_directory/')

### Reload the model and the tokenizer
model = BertForSequenceClassification.from_pretrained('./my_saved_model_directory/')
tokenizer = BertTokenizer.from_pretrained('./my_saved_model_directory/')
```

### Optimizers: BertAdam & OpenAIAdam are now AdamW, schedules are standard PyTorch schedules

The two optimizers previously included, `BertAdam` and `OpenAIAdam`, have been replaced by a single `AdamW` optimizer which has a few differences:

- it only implements weights decay correction,
- schedules are now externals (see below),
- gradient clipping is now also external (see below).

The new optimizer `AdamW` matches PyTorch `Adam` optimizer API and let you use standard PyTorch or apex methods for the schedule and clipping.

The schedules are now standard [PyTorch learning rate schedulers](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) and not part of the optimizer anymore.

Here is a conversion examples from `BertAdam` with a linear warmup and decay schedule to `AdamW` and the same schedule:

```python
# Parameters:
lr = 1e-3
max_grad_norm = 1.0
num_training_steps = 1000
num_warmup_steps = 100
warmup_proportion = float(num_warmup_steps) / float(num_training_steps)  # 0.1

### Previously BertAdam optimizer was instantiated like this:
optimizer = BertAdam(model.parameters(), lr=lr, schedule='warmup_linear', warmup=warmup_proportion, t_total=num_training_steps)
### and used like this:
for batch in train_data:
    loss = model(batch)
    loss.backward()
    optimizer.step()

### In Transformers, optimizer and schedules are splitted and instantiated like this:
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False)  # To reproduce BertAdam specific behavior set correct_bias=False
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=num_warmup_steps, num_training_steps=num_training_steps)  # PyTorch scheduler
### and used like this:
for batch in train_data:
    model.train()
    loss = model(batch)
    loss.backward()
    torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)  # Gradient clipping is not in AdamW anymore (so you can use amp without issue)
    optimizer.step()
    scheduler.step()
    optimizer.zero_grad()
```

## Citation

We now have a paper you can cite for the 🤗 Transformers library:
```bibtex
@article{Wolf2019HuggingFacesTS,
  title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
  author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.03771}
}
```
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
+								<p align="center">
 								    <br>
-												add logo

											
										
										
											2019-09-26 12:28:44 +03:00
+								    <img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/transformers_logo_name.png" width="400"/>
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
+								    <br>
 								<p>
 								<p align="center">
-												CircleCI reference in README
											
										
										
											2019-09-26 15:59:52 +03:00
+								    <a href="https://circleci.com/gh/huggingface/transformers">
-												update readme

											
										
										
											2019-09-26 13:18:26 +03:00
+								        <img alt="Build" src="https://img.shields.io/circleci/build/github/huggingface/transformers/master">
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
+								    </a>
 								    <a href="https://github.com/huggingface/transformers/blob/master/LICENSE">
-												update readme

											
										
										
											2019-09-26 13:18:26 +03:00
+								        <img alt="GitHub" src="https://img.shields.io/github/license/huggingface/transformers.svg?color=blue">
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
+								    </a>
 								    <a href="https://huggingface.co/transformers/index.html">
-												update readme

											
										
										
											2019-09-26 13:18:26 +03:00
+								        <img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/transformers/index.html.svg?down_color=red&down_message=offline&up_message=online">
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
+								    </a>
 								    <a href="https://github.com/huggingface/transformers/releases">
-												update readme

											
										
										
											2019-09-26 13:18:26 +03:00
+								        <img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/transformers.svg">
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
+								    </a>
 								</p>
-												update readme

											
										
										
											2019-09-26 13:18:26 +03:00
+								<h3 align="center">
-												change order pytorch/tf in readme (#4167)


											
										
										
											2020-05-06 23:31:07 +03:00
+								<p>State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0
-												update readme

											
										
										
											2019-09-26 13:18:26 +03:00
+								</h3>
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
-												change order pytorch/tf in readme (#4167)


											
										
										
											2020-05-06 23:31:07 +03:00
+								🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides state-of-the-art general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, T5, CTRL...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over thousands of pretrained models in 100+ languages and deep interoperability between PyTorch & TensorFlow 2.0.
-												update readme

											
										
										
											2019-09-26 13:18:26 +03:00
-												added subtitle for recent contributors in readme (#5130)


											
										
										
											2020-06-29 16:05:08 +03:00
+								### Recent contributors
-												Add contributors snapshot

powered by https://github.com/sourcerer-io/hall-of-fame
											
										
										
											2020-02-06 22:17:36 +03:00
+								[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/0)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/0)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/1)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/1)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/2)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/2)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/3)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/3)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/4)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/4)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/5)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/5)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/6)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/6)[![](https://sourcerer.io/fame/clmnt/huggingface/transformers/images/7)](https://sourcerer.io/fame/clmnt/huggingface/transformers/links/7)
-												update readme

											
										
										
											2019-09-26 13:18:26 +03:00
+								### Features
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
+								- High performance on NLU and NLG tasks
 								- Low barrier to entry for educators and practitioners
 								State-of-the-art NLP for everyone
 								- Deep learning researchers
 								- Hands-on practitioners
 								- AI/ML/NLP teachers and educators
 								Lower compute costs, smaller carbon footprint
 								- Researchers can share trained models instead of always retraining
 								- Practitioners can reduce compute time and production costs
-												quick fix wording readme for community models (#3900)


											
										
										
											2020-04-23 21:19:45 +03:00
+								- Dozens of architectures with over 1,000 pretrained models, some in more than 100 languages
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
 								Choose the right framework for every part of a model's lifetime
 								- Train state-of-the-art models in 3 lines of code
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
+								- Deep interoperability between TensorFlow 2.0 and PyTorch models
 								- Move a single model between TF2.0/PyTorch frameworks at will
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
+								- Seamlessly pick the right framework for training, evaluation, production
-												Setup CI

											
										
										
											2018-12-21 00:33:39 +03:00
-												indeed

											
										
										
											2019-07-16 01:29:15 +03:00
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
+								| Section | Description |
 								|-|-|
 								| [Installation](#installation) | How to install the package |
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
+								| [Model architectures](#model-architectures) | Architectures (with pretrained weights) |
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
+								| [Online demo](#online-demo) | Experimenting with this repo’s text generation capabilities |
 								| [Quick tour: Usage](#quick-tour) | Tokenizers & models usage: Bert and GPT-2 |
-												Fix link in readme

											
										
										
											2019-09-28 11:20:17 +03:00
+								| [Quick tour: TF 2.0 and PyTorch ](#Quick-tour-TF-20-training-and-PyTorch-interoperability) | Train a TF 2.0 model in 10 lines of code, load it in PyTorch |
-												update link in readme

											
										
										
											2019-12-20 21:40:23 +03:00
+								| [Quick tour: pipelines](#quick-tour-of-pipelines) | Using Pipelines: Wrapper around tokenizer and models to use finetuned models |
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
+								| [Quick tour: Fine-tuning/usage scripts](#quick-tour-of-the-fine-tuningusage-scripts) | Using provided scripts: GLUE, SQuAD and Text generation |
-												[doc] Model upload and sharing

ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?

											
										
										
											2019-12-16 20:42:22 +03:00
+								| [Quick tour: Share your models ](#Quick-tour-of-model-sharing) | Upload and share your fine-tuned models with the community |
-												fix pytorch-transformers migration description in README

											
										
										
											2019-10-07 11:59:54 +03:00
+								| [Migrating from pytorch-transformers to transformers](#Migrating-from-pytorch-transformers-to-transformers) | Migrating your code from pytorch-transformers to transformers |
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
+								| [Migrating from pytorch-pretrained-bert to pytorch-transformers](#Migrating-from-pytorch-pretrained-bert-to-transformers) | Migrating your code from pytorch-pretrained-bert to transformers |
-												Remove links for all docs (#5280)


											
										
										
											2020-06-25 18:45:05 +03:00
+								| [Documentation](https://huggingface.co/transformers/) | Full API documentation and more |
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
-												update readme

											
										
										
											2018-11-17 10:42:45 +03:00
+								## Installation
-												Update README.md

											
										
										
											2018-11-03 16:18:44 +03:00
-												Specify PyTorch versions for examples (#4710)


											
										
										
											2020-06-02 11:29:28 +03:00
+								This repo is tested on Python 3.6+, PyTorch 1.0.0+ (PyTorch 1.3.1+ for examples) and TensorFlow 2.0.
-												Begin Updating the README.md

											
										
										
											2018-11-02 09:51:07 +03:00
-												Update contribution instructions.

Also provide shortcuts in a Makefile.

											
										
										
											2019-12-22 23:31:12 +03:00
+								You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
 								Create a virtual environment with the version of Python you're going to use and activate it.
 								Now, if you want to use 🤗 Transformers, you can install it with pip. If you'd like to play with the examples, you must install it from source.
-												update readme

											
										
										
											2018-11-17 10:42:45 +03:00
+								### With pip
-												update readme

											
										
										
											2018-11-05 17:35:44 +03:00
-												update installation instructions in readme

											
										
										
											2019-09-26 17:14:21 +03:00
+								First you need to install one of, or both, TensorFlow 2.0 and PyTorch.
-												Fix some typos in README

											
										
										
											2019-10-06 20:14:34 +03:00
+								Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available) and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.
-												update installation instructions in readme

											
										
										
											2019-09-26 17:14:21 +03:00
 								When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
-												update readme and setup

											
										
										
											2019-07-05 13:30:15 +03:00
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
+								```bash
-												[BIG] pytorch-transformers => transformers

											
										
										
											2019-09-26 11:15:53 +03:00
+								pip install transformers
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
+								```
-												Begin Updating the README.md

											
										
										
											2018-11-02 09:51:07 +03:00
-												update readme

											
										
										
											2018-11-17 10:42:45 +03:00
+								### From source
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
-												update installation instructions in readme

											
										
										
											2019-09-26 17:14:21 +03:00
+								Here also, you first need to install one of, or both, TensorFlow 2.0 and PyTorch.
-												Fix some typos in README

											
										
										
											2019-10-06 20:14:34 +03:00
+								Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available) and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform.
-												update installation instructions in readme

											
										
										
											2019-09-26 17:14:21 +03:00
-												Fix syntax typo in README.md

											
										
										
											2019-10-01 21:57:18 +03:00
+								When TensorFlow 2.0 and/or PyTorch has been installed, you can install from source by cloning the repository and running:
-												update readme and setup

											
										
										
											2019-07-05 13:30:15 +03:00
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
+								```bash
-												Update contribution instructions.

Also provide shortcuts in a Makefile.

											
										
										
											2019-12-22 23:31:12 +03:00
+								git clone https://github.com/huggingface/transformers
 								cd transformers
-												Remove [--editable] in install instructions.

Use -e only in docs targeted at contributors.

If a user copy-pastes  command line with [--editable], they will hit
an error. If they don't know the --editable option, we're giving them
a choice to make before they can move forwards, but this isn't a choice
they need to make right now.

											
										
										
											2019-12-24 10:46:08 +03:00
+								pip install .
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
+								```
-												Begin Updating the README.md

											
										
										
											2018-11-02 09:51:07 +03:00
-												Update contribution instructions.

Also provide shortcuts in a Makefile.

											
										
										
											2019-12-22 23:31:12 +03:00
+								When you update the repository, you should upgrade the transformers installation and its dependencies as follows:
 								```bash
 								git pull
-												Remove [--editable] in install instructions.

Use -e only in docs targeted at contributors.

If a user copy-pastes  command line with [--editable], they will hit
an error. If they don't know the --editable option, we're giving them
a choice to make before they can move forwards, but this isn't a choice
they need to make right now.

											
										
										
											2019-12-24 10:46:08 +03:00
+								pip install --upgrade .
-												Update contribution instructions.

Also provide shortcuts in a Makefile.

											
										
										
											2019-12-22 23:31:12 +03:00
+								```
-												add instructions to run the examples

											
										
										
											2019-11-20 20:01:03 +03:00
+								### Run the examples
 								Examples are included in the repository but are not shipped with the library.
-												Update contribution instructions.

Also provide shortcuts in a Makefile.

											
										
										
											2019-12-22 23:31:12 +03:00
+								Therefore, in order to run the latest versions of the examples, you need to install from source, as described above.
-												add instructions to run the examples

											
										
										
											2019-11-20 20:01:03 +03:00
-												Update contribution instructions.

Also provide shortcuts in a Makefile.

											
										
										
											2019-12-22 23:31:12 +03:00
+								Look at the [README](https://github.com/huggingface/transformers/blob/master/examples/README.md) for how to run examples.
-												update readme

											
										
										
											2019-02-18 13:12:09 +03:00
-												Update contribution instructions.

Also provide shortcuts in a Makefile.

											
										
										
											2019-12-22 23:31:12 +03:00
+								### Tests
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
-												Update contribution instructions.

Also provide shortcuts in a Makefile.

											
										
										
											2019-12-22 23:31:12 +03:00
+								A series of tests are included for the library and for some example scripts. Library tests can be found in the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the [examples folder](https://github.com/huggingface/transformers/tree/master/examples).
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
-												update installation instructions in readme

											
										
										
											2019-09-26 17:14:21 +03:00
+								Depending on which framework is installed (TensorFlow 2.0 and/or PyTorch), the irrelevant tests will be skipped. Ensure that both frameworks are installed if you want to execute all tests.
-												Update contribution instructions.

Also provide shortcuts in a Makefile.

											
										
										
											2019-12-22 23:31:12 +03:00
+								Here's the easiest way to run tests for the library:
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
-												Remove dependency on pytest for running tests (#2055)

* Switch to plain unittest for skipping slow tests.

Add a RUN_SLOW environment variable for running them.

* Switch to plain unittest for PyTorch dependency.

* Switch to plain unittest for TensorFlow dependency.

* Avoid leaking open files in the test suite.

This prevents spurious warnings when running tests.

* Fix unicode warning on Python 2 when running tests.

The warning was:

    UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

* Support running PyTorch tests on a GPU.

Reverts 27e015bd.

* Tests no longer require pytest.

* Make tests pass on cuda

											
										
										
											2019-12-06 21:57:38 +03:00
+								```bash
-												Quote square brackets in shell commands.

This ensures compatibility with zsh.

Fix #2316.

											
										
										
											2019-12-27 10:50:25 +03:00
+								pip install -e ".[testing]"
-												Update contribution instructions.

Also provide shortcuts in a Makefile.

											
										
										
											2019-12-22 23:31:12 +03:00
+								make test
-												Remove dependency on pytest for running tests (#2055)

* Switch to plain unittest for skipping slow tests.

Add a RUN_SLOW environment variable for running them.

* Switch to plain unittest for PyTorch dependency.

* Switch to plain unittest for TensorFlow dependency.

* Avoid leaking open files in the test suite.

This prevents spurious warnings when running tests.

* Fix unicode warning on Python 2 when running tests.

The warning was:

    UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

* Support running PyTorch tests on a GPU.

Reverts 27e015bd.

* Tests no longer require pytest.

* Make tests pass on cuda

											
										
										
											2019-12-06 21:57:38 +03:00
+								```
-												Update contribution instructions.

Also provide shortcuts in a Makefile.

											
										
										
											2019-12-22 23:31:12 +03:00
+								and for the examples:
-												Remove dependency on pytest for running tests (#2055)

* Switch to plain unittest for skipping slow tests.

Add a RUN_SLOW environment variable for running them.

* Switch to plain unittest for PyTorch dependency.

* Switch to plain unittest for TensorFlow dependency.

* Avoid leaking open files in the test suite.

This prevents spurious warnings when running tests.

* Fix unicode warning on Python 2 when running tests.

The warning was:

    UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

* Support running PyTorch tests on a GPU.

Reverts 27e015bd.

* Tests no longer require pytest.

* Make tests pass on cuda

											
										
										
											2019-12-06 21:57:38 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								```bash
-												Quote square brackets in shell commands.

This ensures compatibility with zsh.

Fix #2316.

											
										
										
											2019-12-27 10:50:25 +03:00
+								pip install -e ".[testing]"
-												Update contribution instructions.

Also provide shortcuts in a Makefile.

											
										
										
											2019-12-22 23:31:12 +03:00
+								pip install -r examples/requirements.txt
 								make test-examples
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								```
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
-												Update contribution instructions.

Also provide shortcuts in a Makefile.

											
										
										
											2019-12-22 23:31:12 +03:00
+								For details, refer to the [contributing guide](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#tests).
-												Remove dependency on pytest for running tests (#2055)

* Switch to plain unittest for skipping slow tests.

Add a RUN_SLOW environment variable for running them.

* Switch to plain unittest for PyTorch dependency.

* Switch to plain unittest for TensorFlow dependency.

* Avoid leaking open files in the test suite.

This prevents spurious warnings when running tests.

* Fix unicode warning on Python 2 when running tests.

The warning was:

    UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

* Support running PyTorch tests on a GPU.

Reverts 27e015bd.

* Tests no longer require pytest.

* Make tests pass on cuda

											
										
										
											2019-12-06 21:57:38 +03:00
-												link to `swift-coreml-transformers`

											
										
										
											2019-08-01 04:09:04 +03:00
+								### Do you want to run a Transformer model on a mobile device?
 								You should check out our [`swift-coreml-transformers`](https://github.com/huggingface/swift-coreml-transformers) repo.
-												Update link to swift-coreml-transformers

cc @lysandrejik

											
										
										
											2019-10-08 23:37:52 +03:00
+								It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`, `DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices.
-												link to `swift-coreml-transformers`

											
										
										
											2019-08-01 04:09:04 +03:00
-												update installation instructions in readme

											
										
										
											2019-09-26 17:14:21 +03:00
+								At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models to productizing them in CoreML, or prototype a model or an app in CoreML then research its hyperparameters or architecture from TensorFlow 2.0 and/or PyTorch. Super exciting!
-												link to `swift-coreml-transformers`

											
										
										
											2019-08-01 04:09:04 +03:00
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
+								## Model architectures
-												fix: wrong architecture count in README

Just say “the following” so that this intro doesn't so easily fall out of date :) )
											
										
										
											2019-12-17 19:18:00 +03:00
+								🤗 Transformers currently provides the following NLU/NLG architectures:
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
-												[Docs] Add DialoGPT (#3755)

* add dialoGPT

* update README.md

* fix conflict

* update readme

* add code links to docs

* Update README.md

* Update dialo_gpt2.rst

* Update pretrained_models.rst

* Update docs/source/model_doc/dialo_gpt2.rst

Co-Authored-By: Julien Chaumond <chaumond@gmail.com>

* change filename of dialogpt

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
											
										
										
											2020-04-16 10:04:32 +03:00
+. **[BERT](https://huggingface.co/transformers/model_doc/bert.html)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
 . **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
 . **[GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
 . **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
 . **[XLNet](https://huggingface.co/transformers/model_doc/xlnet.html)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
 . **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
 . **[RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
 . **[DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
 . **[CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
 . **[CamemBERT](https://huggingface.co/transformers/model_doc/camembert.html)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
 . **[ALBERT](https://huggingface.co/transformers/model_doc/albert.html)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
 . **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
 . **[XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
-												Merge branch 'master' into pr/2115

											
										
										
											2019-12-21 16:54:30 +03:00
+. **[MMBT](https://github.com/facebookresearch/mmbt/)** (from Facebook), released together with the paper a [Supervised Multimodal Bitransformers for Classifying Images and Text](https://arxiv.org/pdf/1909.02950.pdf) by Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, Davide Testuggine.
-												[Docs] Add DialoGPT (#3755)

* add dialoGPT

* update README.md

* fix conflict

* update readme

* add code links to docs

* Update README.md

* Update dialo_gpt2.rst

* Update pretrained_models.rst

* Update docs/source/model_doc/dialo_gpt2.rst

Co-Authored-By: Julien Chaumond <chaumond@gmail.com>

* change filename of dialogpt

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
											
										
										
											2020-04-16 10:04:32 +03:00
+. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
 . **[BART](https://huggingface.co/transformers/model_doc/bart.html)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
 . **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
-												add "by" to ReadMe
											
										
										
											2020-04-18 19:07:17 +03:00
+. **[DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
-												Reformer (#3351)

* first copy & past commit from Bert and morgans LSH code

* add easy way to compare to trax original code

* translate most of function

* make trax lsh self attention deterministic with numpy seed + copy paste code

* add same config

* add same config

* make layer init work

* implemented hash_vectors function for lsh attention

* continue reformer translation

* hf LSHSelfAttentionLayer gives same output as trax layer

* refactor code

* refactor code

* refactor code

* refactor

* refactor + add reformer config

* delete bogus file

* split reformer attention layer into two layers

* save intermediate step

* save intermediate step

* make test work

* add complete reformer block layer

* finish reformer layer

* implement causal and self mask

* clean reformer test and refactor code

* fix merge conflicts

* fix merge conflicts

* update init

* fix device for GPU

* fix chunk length init for tests

* include morgans optimization

* improve memory a bit

* improve comment

* factorize num_buckets

* better testing parameters

* make whole model work

* make lm model work

* add t5 copy paste tokenizer

* add chunking feed forward

* clean config

* add improved assert statements

* make tokenizer work

* improve test

* correct typo

* extend config

* add complexer test

* add new axial position embeddings

* add local block attention layer

* clean tests

* refactor

* better testing

* save intermediate progress

* clean test file

* make shorter input length work for model

* allow variable input length

* refactor

* make forward pass for pretrained model work

* add generation possibility

* finish dropout and init

* make style

* refactor

* add first version of RevNet Layers

* make forward pass work and add convert file

* make uploaded model forward pass work

* make uploaded model forward pass work

* refactor code

* add namedtuples and cache buckets

* correct head masks

* refactor

* made reformer more flexible

* make style

* remove set max length

* add attention masks

* fix up tests

* fix lsh attention mask

* make random seed optional for the moment

* improve memory in reformer

* add tests

* make style

* make sure masks work correctly

* detach gradients

* save intermediate

* correct backprob through gather

* make style

* change back num hashes

* rename to labels

* fix rotation shape

* fix detach

* update

* fix trainer

* fix backward dropout

* make reformer more flexible

* fix conflict

* fix

* fix

* add tests for fixed seed in reformer layer

* fix trainer typo

* fix typo in activations

* add fp16 tests

* add fp16 training

* support fp16

* correct gradient bug in reformer

* add fast gelu

* re-add dropout for embedding dropout

* better naming

* better naming

* renaming

* finalize test branch

* finalize tests

* add more tests

* finish tests

* fix

* fix type trainer

* fix fp16 tests

* fix tests

* fix tests

* fix tests

* fix issue with dropout

* fix dropout seeds

* correct random seed on gpu

* finalize random seed for dropout

* finalize random seed for dropout

* remove duplicate line

* correct half precision bug

* make style

* refactor

* refactor

* docstring

* remove sinusoidal position encodings for reformer

* move chunking to modeling_utils

* make style

* clean config

* make style

* fix tests

* fix auto tests

* pretrained models

* fix docstring

* update conversion file

* Update pretrained_models.rst

* fix rst

* fix rst

* update copyright

* fix test path

* fix test path

* fix small issue in test

* include reformer in generation tests

* add docs for axial position encoding

* finish docs

* Update convert_reformer_trax_checkpoint_to_pytorch.py

* remove isort

* include sams comments

* remove wrong comment in utils

* correct typos

* fix typo

* Update reformer.rst

* applied morgans optimization

* make style

* make gpu compatible

* remove bogus file

* big test refactor

* add example for chunking

* fix typo

* add to README
											
										
										
											2020-05-07 11:17:01 +03:00
+. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
-												[Marian] documentation and AutoModel support (#4152)

- MarianSentencepieceTokenizer - > MarianTokenizer
- Start using unk token.
- add docs page
- add better generation params to MarianConfig
- more conversion utilities
											
										
										
											2020-05-10 20:54:57 +03:00
+. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
-												Longformer (#4352)

* first commit

* bug fixes

* better examples

* undo padding

* remove wrong VOCAB_FILES_NAMES

* License

* make style

* make isort happy

* unit tests

* integration test

* make `black` happy by undoing `isort` changes!!

* lint

* no need for the padding value

* batch_size not bsz

* remove unused type casting

* seqlen not seq_len

* staticmethod

* `bert` selfattention instead of `n2`

* uint8 instead of bool + lints

* pad inputs_embeds using embeddings not a constant

* black

* unit test with padding

* fix unit tests

* remove redundant unit test

* upload model weights

* resolve todo

* simpler _mask_invalid_locations without lru_cache + backward compatible masked_fill_

* increase unittest coverage
											
										
										
											2020-05-19 17:04:43 +03:00
+. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
 . **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
 . Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
-												Fix typo in root README (#5073)


											
										
										
											2020-06-20 18:00:04 +03:00
+								These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Pearson R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
-												[README] link to Write With Transformer

											
										
										
											2019-09-03 17:29:41 +03:00
+								## Online demo
-												Add inference widget examples (#5825)


											
										
										
											2020-07-28 16:14:00 +03:00
+								You can test our inference API on most model pages from the model hub: https://huggingface.co/models
 								For example:
 								- [Masked word completion with BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France)
 								- [NER with Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city)
 								- [Text generation with GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C+)
 								- [NLI with RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal)
 								- [Summarization with BART](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct)
 								- [Question answering with DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
 								- [Translation with T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin)
-												[README] link to Write With Transformer

											
										
										
											2019-09-03 17:29:41 +03:00
-												Add inference widget examples (#5825)


											
										
										
											2020-07-28 16:14:00 +03:00
+								**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team at transformer.huggingface.co, is the official demo of this repo’s text generation capabilities.
-												link to `swift-coreml-transformers`

											
										
										
											2019-08-01 04:09:04 +03:00
-												simpler quick tour

											
										
										
											2019-07-16 17:02:32 +03:00
+								## Quick tour
-												update readme

											
										
										
											2018-11-17 10:42:45 +03:00
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
+								Let's do a very quick overview of the model architectures in 🤗 Transformers. Detailed examples for each model architecture (Bert, GPT, GPT-2, Transformer-XL, XLNet and XLM) can be found in the [full documentation](https://huggingface.co/transformers/).
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
 								```python
 								import torch
-												[BIG] pytorch-transformers => transformers

											
										
										
											2019-09-26 11:15:53 +03:00
+								from transformers import *
-												simpler quick tour

											
										
										
											2019-07-16 17:02:32 +03:00
-												[BIG] pytorch-transformers => transformers

											
										
										
											2019-09-26 11:15:53 +03:00
+								# Transformers has a unified API
-												readme: add XLM-RoBERTa to model architecture list

											
										
										
											2019-12-18 21:44:23 +03:00
+								# for 10 transformer architectures and 30 pretrained weights.
-												simpler quick tour

											
										
										
											2019-07-16 17:02:32 +03:00
+								#          Model          | Tokenizer          | Pretrained weights shortcut
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
+								MODELS = [(BertModel,       BertTokenizer,       'bert-base-uncased'),
 								          (OpenAIGPTModel,  OpenAIGPTTokenizer,  'openai-gpt'),
 								          (GPT2Model,       GPT2Tokenizer,       'gpt2'),
-												Adding CTRL (squashed commit)

adding conversion script

adding first draft of modeling & tokenization

adding placeholder for test files

bunch of changes

registering the tokenizer/model/etc

tests

change link; something is very VERY wrong here

weird end-of-word thingy going on

i think the tokenization works now ; wrote the unit tests

overall structure works;load w next

the monster is alive!

works after some cleanup as well

adding emacs autosave to gitignore

currently only supporting the 48 layer one; seems to infer fine on my macbook

cleanup

fixing some documentation

fixing some documentation

tests passing?

now works on CUDA also

adding greedy?

adding greedy sampling

works well

											
										
										
											2019-09-30 19:48:41 +03:00
+								          (CTRLModel,       CTRLTokenizer,       'ctrl'),
-												wip readme

											
										
										
											2019-09-26 12:21:34 +03:00
+								          (TransfoXLModel,  TransfoXLTokenizer,  'transfo-xl-wt103'),
 								          (XLNetModel,      XLNetTokenizer,      'xlnet-base-cased'),
 								          (XLMModel,        XLMTokenizer,        'xlm-mlm-enfr-1024'),
-												distilbert-base-cased weights + Readmes + omissions

											
										
										
											2020-02-07 22:19:35 +03:00
+								          (DistilBertModel, DistilBertTokenizer, 'distilbert-base-cased'),
-												readme: add XLM-RoBERTa to model architecture list

											
										
										
											2019-12-18 21:44:23 +03:00
+								          (RobertaModel,    RobertaTokenizer,    'roberta-base'),
 								          (XLMRobertaModel, XLMRobertaTokenizer, 'xlm-roberta-base'),
 								         ]
-												simpler quick tour

											
										
										
											2019-07-16 17:02:32 +03:00
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
+								# To use TensorFlow 2.0 versions of the models, simply prefix the class names with 'TF', e.g. `TFRobertaModel` is the TF 2.0 counterpart of the PyTorch model `RobertaModel`
-												simpler quick tour

											
										
										
											2019-07-16 17:02:32 +03:00
+								# Let's encode some text in a sequence of hidden-states using each model:
 								for model_class, tokenizer_class, pretrained_weights in MODELS:
 								    # Load pretrained model/tokenizer
 								    tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
 								    model = model_class.from_pretrained(pretrained_weights)
 								    # Encode text
-												update readme to mention add_special_tokens more clearly in example

											
										
										
											2019-08-30 12:30:51 +03:00
+								    input_ids = torch.tensor([tokenizer.encode("Here is some text to encode", add_special_tokens=True)])  # Add special tokens takes care of adding [CLS], [SEP], <s>... tokens in the right way for each model.
-												Update readme
											
										
										
											2019-07-23 17:05:29 +03:00
+								    with torch.no_grad():
 								        last_hidden_states = model(input_ids)[0]  # Models outputs are now tuples
-												simpler quick tour

											
										
										
											2019-07-16 17:02:32 +03:00
 								# Each architecture is provided with several class for fine-tuning on down-stream tasks, e.g.
 								BERT_MODEL_CLASSES = [BertModel, BertForPreTraining, BertForMaskedLM, BertForNextSentencePrediction,
-												fix #1789

											
										
										
											2019-11-12 10:52:43 +03:00
+								                      BertForSequenceClassification, BertForTokenClassification, BertForQuestionAnswering]
-												simpler quick tour

											
										
										
											2019-07-16 17:02:32 +03:00
-												update readme

											
										
										
											2019-07-16 17:03:48 +03:00
+								# All the classes for an architecture can be initiated from pretrained weights for this architecture
 								# Note that additional weights added for fine-tuning are only initialized
 								# and need to be trained on the down-stream task
-												Fixed the sample code in the title 'Quick tour'.

											
										
										
											2019-10-12 14:17:17 +03:00
+								pretrained_weights = 'bert-base-uncased'
 								tokenizer = BertTokenizer.from_pretrained(pretrained_weights)
-												simpler quick tour

											
										
										
											2019-07-16 17:02:32 +03:00
+								for model_class in BERT_MODEL_CLASSES:
 								    # Load pretrained model/tokenizer
-												Fixed the sample code in the title 'Quick tour'.

											
										
										
											2019-10-12 14:17:17 +03:00
+								    model = model_class.from_pretrained(pretrained_weights)
-												simpler quick tour

											
										
										
											2019-07-16 17:02:32 +03:00
-												Update README.md

Lines 183 - 200, fixed indentation. Line 198, replaced `tokenizer_class` with `BertTokenizer`, since `tokenizer_class` is not defined in the loop it belongs to.
											
										
										
											2019-09-29 02:35:06 +03:00
+								    # Models can return full list of hidden-states & attentions weights at each layer
 								    model = model_class.from_pretrained(pretrained_weights,
 								                                        output_hidden_states=True,
 								                                        output_attentions=True)
 								    input_ids = torch.tensor([tokenizer.encode("Let's see all hidden-states and attentions on this text")])
 								    all_hidden_states, all_attentions = model(input_ids)[-2:]
-												simpler quick tour

											
										
										
											2019-07-16 17:02:32 +03:00
-												Update README.md

Lines 183 - 200, fixed indentation. Line 198, replaced `tokenizer_class` with `BertTokenizer`, since `tokenizer_class` is not defined in the loop it belongs to.
											
										
										
											2019-09-29 02:35:06 +03:00
+								    # Models are compatible with Torchscript
 								    model = model_class.from_pretrained(pretrained_weights, torchscript=True)
 								    traced_model = torch.jit.trace(model, (input_ids,))
-												simpler quick tour

											
										
										
											2019-07-16 17:02:32 +03:00
-												Update README.md

Lines 183 - 200, fixed indentation. Line 198, replaced `tokenizer_class` with `BertTokenizer`, since `tokenizer_class` is not defined in the loop it belongs to.
											
										
										
											2019-09-29 02:35:06 +03:00
+								    # Simple serialization for models and tokenizers
 								    model.save_pretrained('./directory/to/save/')  # save
 								    model = model_class.from_pretrained('./directory/to/save/')  # re-load
 								    tokenizer.save_pretrained('./directory/to/save/')  # save
 								    tokenizer = BertTokenizer.from_pretrained('./directory/to/save/')  # re-load
-												simpler quick tour

											
										
										
											2019-07-16 17:02:32 +03:00
-												Update README.md

Lines 183 - 200, fixed indentation. Line 198, replaced `tokenizer_class` with `BertTokenizer`, since `tokenizer_class` is not defined in the loop it belongs to.
											
										
										
											2019-09-29 02:35:06 +03:00
+								    # SOTA examples for GLUE, SQUAD, text generation...
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
+								```
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
+								## Quick tour TF 2.0 training and PyTorch interoperability
 								Let's do a quick example of how a TensorFlow 2.0 model can be trained in 12 lines of code with 🤗 Transformers and then loaded in PyTorch for fast inspection/tests.
 								```python
 								import tensorflow as tf
 								import tensorflow_datasets
-												typo in readme/doc

											
										
										
											2019-09-26 17:23:28 +03:00
+								from transformers import *
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
 								# Load dataset, tokenizer, model from pretrained model/vocabulary
 								tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
 								model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')
 								data = tensorflow_datasets.load('glue/mrpc')
 								# Prepare dataset for GLUE as a tf.data.Dataset instance
-												typo in readme/doc

											
										
										
											2019-09-26 17:23:28 +03:00
+								train_dataset = glue_convert_examples_to_features(data['train'], tokenizer, max_length=128, task='mrpc')
 								valid_dataset = glue_convert_examples_to_features(data['validation'], tokenizer, max_length=128, task='mrpc')
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
+								train_dataset = train_dataset.shuffle(100).batch(32).repeat(2)
 								valid_dataset = valid_dataset.batch(64)
-												Remove trailing whitespace in README.

											
										
										
											2019-12-22 15:29:58 +03:00
+								# Prepare training: Compile tf.keras model with optimizer, loss and learning rate schedule
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
+								optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
 								loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
 								metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
 								model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
 								# Train and evaluate using tf.keras.Model.fit()
 								history = model.fit(train_dataset, epochs=2, steps_per_epoch=115,
 								                    validation_data=valid_dataset, validation_steps=7)
 								# Load the TensorFlow model in PyTorch for inspection
 								model.save_pretrained('./save/')
 								pytorch_model = BertForSequenceClassification.from_pretrained('./save/', from_tf=True)
 								# Quickly test a few predictions - MRPC is a paraphrasing task, let's see if our model learned the task
 								sentence_0 = "This research was consistent with his findings."
 								sentence_1 = "His findings were compatible with this research."
 								sentence_2 = "His findings were not compatible with this research."
-												[tokenizers] Updates data processors, docstring, examples and model cards to the new API (#5308)

* remove references to old API in docstring - update data processors

* style

* fix tests - better type checking error messages

* better type checking

* include awesome fix by @LysandreJik for #5310

* updated doc and examples
											
										
										
											2020-06-26 20:48:14 +03:00
+								inputs_1 = tokenizer(sentence_0, sentence_1, add_special_tokens=True, return_tensors='pt')
 								inputs_2 = tokenizer(sentence_0, sentence_2, add_special_tokens=True, return_tensors='pt')
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
-												Remove `special_tokens_mask` from inputs in README

Co-authored-by: Thomas Wolf @thomwolf
											
										
										
											2019-10-16 18:05:13 +03:00
+								pred_1 = pytorch_model(inputs_1['input_ids'], token_type_ids=inputs_1['token_type_ids'])[0].argmax().item()
 								pred_2 = pytorch_model(inputs_2['input_ids'], token_type_ids=inputs_2['token_type_ids'])[0].argmax().item()
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
+								print("sentence_1 is", "a paraphrase" if pred_1 else "not a paraphrase", "of sentence_0")
 								print("sentence_2 is", "a paraphrase" if pred_2 else "not a paraphrase", "of sentence_0")
 								```
-												simpler quick tour

											
										
										
											2019-07-16 17:02:32 +03:00
+								## Quick tour of the fine-tuning/usage scripts
-												big doc update [WIP]

											
										
										
											2019-08-04 13:14:57 +03:00
-												Remove trailing whitespace in README.

											
										
										
											2019-12-22 15:29:58 +03:00
+								**Important**
-												add instructions to run the examples

											
										
										
											2019-11-20 20:01:03 +03:00
+								Before running the fine-tuning scripts, please read the
 								[instructions](#run-the-examples) on how to
 								setup your environment to run the examples.
-												update readme

											
										
										
											2019-07-16 01:21:33 +03:00
+								The library comprises several example scripts with SOTA performances for NLU and NLG tasks:
-												update readme

											
										
										
											2019-01-10 03:25:28 +03:00
-												Trainer (#3800)

* doc

* [tests] Add sample files for a regression task

* [HUGE] Trainer

* Feedback from @sshleifer

* Feedback from @thomwolf + logging tweak

* [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes

* [glue] Use default max_seq_length of 128 like before

* [glue] move DataTrainingArguments around

* [ner] Change interface of InputExample, and align run_{tf,pl}

* Re-align the pl scripts a little bit

* ner

* [ner] Add integration test

* Fix language_modeling with API tweak

* [ci] Tweak loss target

* Don't break console output

* amp.initialize: model must be on right device before

* [multiple-choice] update for Trainer

* Re-align to 827d6d6ef071029cfe82838a18dab046b5813976
											
										
										
											2020-04-22 03:11:56 +03:00
+								- `run_glue.py`: an example fine-tuning sequence classification models on nine different GLUE tasks (*sequence-level classification*)
 								- `run_squad.py`: an example fine-tuning question answering models on the question answering dataset SQuAD 2.0 (*token-level classification*)
 								- `run_ner.py`: an example fine-tuning token classification models on named entity recognition (*token-level classification*)
-												Adding CTRL (squashed commit)

adding conversion script

adding first draft of modeling & tokenization

adding placeholder for test files

bunch of changes

registering the tokenizer/model/etc

tests

change link; something is very VERY wrong here

weird end-of-word thingy going on

i think the tokenization works now ; wrote the unit tests

overall structure works;load w next

the monster is alive!

works after some cleanup as well

adding emacs autosave to gitignore

currently only supporting the 48 layer one; seems to infer fine on my macbook

cleanup

fixing some documentation

fixing some documentation

tests passing?

now works on CUDA also

adding greedy?

adding greedy sampling

works well

											
										
										
											2019-09-30 19:48:41 +03:00
+								- `run_generation.py`: an example using GPT, GPT-2, CTRL, Transformer-XL and XLNet for conditional language generation
-												updates to readme and doc

											
										
										
											2019-07-16 14:56:47 +03:00
+								- other model-specific examples (see the documentation).
-												cuda on in the examples by default

											
										
										
											2019-02-11 14:15:43 +03:00
-												update readme

											
										
										
											2019-07-16 01:21:33 +03:00
+								Here are three quick usage examples for these scripts:
-												update readme

											
										
										
											2019-01-10 03:25:28 +03:00
-												updates to readme and doc

											
										
										
											2019-07-16 14:56:47 +03:00
+								### `run_glue.py`: Fine-tuning on GLUE tasks for sequence classification
-												update readme

											
										
										
											2019-01-10 03:25:28 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								The [General Language Understanding Evaluation (GLUE) benchmark](https://gluebenchmark.com/) is a collection of nine sentence- or sentence-pair language understanding tasks for evaluating and analyzing natural language understanding systems.
-												update readme

											
										
										
											2019-01-10 03:25:28 +03:00
-												Trainer (#3800)

* doc

* [tests] Add sample files for a regression task

* [HUGE] Trainer

* Feedback from @sshleifer

* Feedback from @thomwolf + logging tweak

* [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes

* [glue] Use default max_seq_length of 128 like before

* [glue] move DataTrainingArguments around

* [ner] Change interface of InputExample, and align run_{tf,pl}

* Re-align the pl scripts a little bit

* ner

* [ner] Add integration test

* Fix language_modeling with API tweak

* [ci] Tweak loss target

* Don't break console output

* amp.initialize: model must be on right device before

* [multiple-choice] update for Trainer

* Re-align to 827d6d6ef071029cfe82838a18dab046b5813976
											
										
										
											2020-04-22 03:11:56 +03:00
+								Before running any of these GLUE tasks you should download the
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								[GLUE data](https://gluebenchmark.com/tasks) by running
 								[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
 								and unpack it to some directory `$GLUE_DIR`.
-												cuda on in the examples by default

											
										
										
											2019-02-11 14:15:43 +03:00
-												update readme and pretrained model weight files

											
										
										
											2019-07-16 16:11:29 +03:00
+								You should also install the additional packages required by the examples:
 								```shell
 								pip install -r ./examples/requirements.txt
 								```
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								```shell
 								export GLUE_DIR=/path/to/glue
 								export TASK_NAME=MRPC
-												update readme

											
										
										
											2019-01-10 03:25:28 +03:00
-												BIG Reorganize examples  (#4213)

* Created using Colaboratory

* [examples] reorganize files

* remove run_tpu_glue.py as superseded by TPU support in Trainer

* Bugfix: int, not tuple

* move files around
											
										
										
											2020-05-07 20:48:44 +03:00
+								python ./examples/text-classification/run_glue.py \
-												update readme and pretrained model weight files

											
										
										
											2019-07-16 16:11:29 +03:00
+								    --model_name_or_path bert-base-uncased \
 								    --task_name $TASK_NAME \
 								    --do_train \
 								    --do_eval \
 								    --data_dir $GLUE_DIR/$TASK_NAME \
 								    --max_seq_length 128 \
-												per_device instead of per_gpu/error thrown when argument unknown (#4618)

* per_device instead of per_gpu/error thrown when argument unknown

* [docs] Restore examples.md symlink

* Correct absolute links so that symlink to the doc works correctly

* Update src/transformers/hf_argparser.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Warning + reorder

* Docs

* Style

* not for squad

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
											
										
										
											2020-05-27 18:36:55 +03:00
+								    --per_device_eval_batch_size=8   \
 								    --per_device_train_batch_size=8   \
-												update readme and pretrained model weight files

											
										
										
											2019-07-16 16:11:29 +03:00
+								    --learning_rate 2e-5 \
 								    --num_train_epochs 3.0 \
 								    --output_dir /tmp/$TASK_NAME/
-												update readme

											
										
										
											2019-01-10 03:25:28 +03:00
+								```
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								where task name can be one of CoLA, SST-2, MRPC, STS-B, QQP, MNLI, QNLI, RTE, WNLI.
-												update readme

											
										
										
											2019-02-18 13:12:09 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								The dev set results will be present within the text file 'eval_results.txt' in the specified output_dir. In case of MNLI, since there are two separate dev sets, matched and mismatched, there will be a separate output folder called '/tmp/MNLI-MM/' in addition to '/tmp/MNLI/'.
-												update readme

											
										
										
											2019-02-18 13:12:09 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								#### Fine-tuning XLNet model on the STS-B regression task
-												update readme

											
										
										
											2019-02-18 13:12:09 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								This example code fine-tunes XLNet on the STS-B corpus using parallel training on a server with 4 V100 GPUs.
-												update readme and pretrained model weight files

											
										
										
											2019-07-16 16:11:29 +03:00
+								Parallel training is a simple way to use several GPUs (but is slower and less flexible than distributed training, see below).
-												update readme

											
										
										
											2019-02-18 13:12:09 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								```shell
 								export GLUE_DIR=/path/to/glue
-												update readme

											
										
										
											2019-02-18 13:12:09 +03:00
-												BIG Reorganize examples  (#4213)

* Created using Colaboratory

* [examples] reorganize files

* remove run_tpu_glue.py as superseded by TPU support in Trainer

* Bugfix: int, not tuple

* move files around
											
										
										
											2020-05-07 20:48:44 +03:00
+								python ./examples/text-classification/run_glue.py \
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								    --model_name_or_path xlnet-large-cased \
 								    --do_train  \
-												update readme and pretrained model weight files

											
										
										
											2019-07-16 16:11:29 +03:00
+								    --do_eval   \
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								    --task_name=sts-b     \
 								    --data_dir=${GLUE_DIR}/STS-B  \
 								    --output_dir=./proc_data/sts-b-110   \
 								    --max_seq_length=128   \
-												per_device instead of per_gpu/error thrown when argument unknown (#4618)

* per_device instead of per_gpu/error thrown when argument unknown

* [docs] Restore examples.md symlink

* Correct absolute links so that symlink to the doc works correctly

* Update src/transformers/hf_argparser.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Warning + reorder

* Docs

* Style

* not for squad

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
											
										
										
											2020-05-27 18:36:55 +03:00
+								    --per_device_eval_batch_size=8   \
 								    --per_device_train_batch_size=8   \
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								    --gradient_accumulation_steps=1 \
 								    --max_steps=1200  \
 								    --model_name=xlnet-large-cased   \
 								    --overwrite_output_dir   \
 								    --overwrite_cache \
 								    --warmup_steps=120
-												update readme

											
										
										
											2019-02-18 13:12:09 +03:00
+								```
-												Fix small typos

											
										
										
											2019-07-31 18:05:06 +03:00
+								On this machine we thus have a batch size of 32, please increase `gradient_accumulation_steps` to reach the same batch size if you have a smaller machine. These hyper-parameters should result in a Pearson correlation coefficient of `+0.917` on the development set.
-												update readme

											
										
										
											2019-02-18 13:12:09 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								#### Fine-tuning Bert model on the MRPC classification task
-												update readme

											
										
										
											2019-02-18 13:12:09 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								This example code fine-tunes the Bert Whole Word Masking model on the Microsoft Research Paraphrase Corpus (MRPC) corpus using distributed training on 8 V100 GPUs to reach a F1 > 92.
-												update readme

											
										
										
											2019-02-18 13:12:09 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								```bash
-												BIG Reorganize examples  (#4213)

* Created using Colaboratory

* [examples] reorganize files

* remove run_tpu_glue.py as superseded by TPU support in Trainer

* Bugfix: int, not tuple

* move files around
											
										
										
											2020-05-07 20:48:44 +03:00
+								python -m torch.distributed.launch --nproc_per_node 8 ./examples/text-classification/run_glue.py   \
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								    --model_name_or_path bert-large-uncased-whole-word-masking \
 								    --task_name MRPC \
 								    --do_train   \
 								    --do_eval   \
 								    --data_dir $GLUE_DIR/MRPC/   \
 								    --max_seq_length 128   \
-												per_device instead of per_gpu/error thrown when argument unknown (#4618)

* per_device instead of per_gpu/error thrown when argument unknown

* [docs] Restore examples.md symlink

* Correct absolute links so that symlink to the doc works correctly

* Update src/transformers/hf_argparser.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Warning + reorder

* Docs

* Style

* not for squad

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
											
										
										
											2020-05-27 18:36:55 +03:00
+								    --per_device_eval_batch_size=8   \
 								    --per_device_train_batch_size=8   \
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								    --learning_rate 2e-5   \
 								    --num_train_epochs 3.0  \
 								    --output_dir /tmp/mrpc_output/ \
 								    --overwrite_output_dir   \
 								    --overwrite_cache \
-												update readme

											
										
										
											2019-02-18 13:12:09 +03:00
+								```
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								Training with these hyper-parameters gave us the following results:
-												updating hub

											
										
										
											2019-06-17 17:21:28 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								```bash
 								  acc = 0.8823529411764706
 								  acc_and_f1 = 0.901702786377709
 								  eval_loss = 0.3418912578906332
 								  f1 = 0.9210526315789473
 								  global_step = 174
 								  loss = 0.07231863956341798
-												updating hub

											
										
										
											2019-06-17 17:21:28 +03:00
+								```
-												updates to readme and doc

											
										
										
											2019-07-16 14:56:47 +03:00
+								### `run_squad.py`: Fine-tuning on SQuAD for question-answering
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								This example code fine-tunes BERT on the SQuAD dataset using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD:
-												sub  section overviews

											
										
										
											2018-11-17 10:55:56 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								```bash
-												[doc] Fix broken links + remove crazy big notebook

											
										
										
											2020-05-08 01:44:18 +03:00
+								python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_squad.py \
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								    --model_type bert \
 								    --model_name_or_path bert-large-uncased-whole-word-masking \
 								    --do_train \
-												updated examples in readme

											
										
										
											2019-07-16 17:09:29 +03:00
+								    --do_eval \
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								    --train_file $SQUAD_DIR/train-v1.1.json \
 								    --predict_file $SQUAD_DIR/dev-v1.1.json \
 								    --learning_rate 3e-5 \
 								    --num_train_epochs 2 \
 								    --max_seq_length 384 \
 								    --doc_stride 128 \
 								    --output_dir ../models/wwm_uncased_finetuned_squad/ \
-												per_device instead of per_gpu/error thrown when argument unknown (#4618)

* per_device instead of per_gpu/error thrown when argument unknown

* [docs] Restore examples.md symlink

* Correct absolute links so that symlink to the doc works correctly

* Update src/transformers/hf_argparser.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Warning + reorder

* Docs

* Style

* not for squad

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
											
										
										
											2020-05-27 18:36:55 +03:00
+								    --per_device_eval_batch_size=3   \
 								    --per_device_train_batch_size=3   \
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
+								```
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								Training with these hyper-parameters gave us the following results:
-												update readme

											
										
										
											2019-02-18 13:12:09 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								```bash
 								python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ../models/wwm_uncased_finetuned_squad/predictions.json
 								{"exact_match": 86.91579943235573, "f1": 93.1532499015869}
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
+								```
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								This is the model provided as `bert-large-uncased-whole-word-masking-finetuned-squad`.
-												added best practices for serialization in README and examples

											
										
										
											2019-04-15 16:00:33 +03:00
-												Adding CTRL (squashed commit)

adding conversion script

adding first draft of modeling & tokenization

adding placeholder for test files

bunch of changes

registering the tokenizer/model/etc

tests

change link; something is very VERY wrong here

weird end-of-word thingy going on

i think the tokenization works now ; wrote the unit tests

overall structure works;load w next

the monster is alive!

works after some cleanup as well

adding emacs autosave to gitignore

currently only supporting the 48 layer one; seems to infer fine on my macbook

cleanup

fixing some documentation

fixing some documentation

tests passing?

now works on CUDA also

adding greedy?

adding greedy sampling

works well

											
										
										
											2019-09-30 19:48:41 +03:00
+								### `run_generation.py`: Text generation with GPT, GPT-2, CTRL, Transformer-XL and XLNet
-												added best practices for serialization in README and examples

											
										
										
											2019-04-15 16:00:33 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								A conditional generation script is also included to generate text from a prompt.
-												Fix syntax typo in README.md

											
										
										
											2019-10-01 21:57:18 +03:00
+								The generation script includes the [tricks](https://github.com/rusiaaman/XLNet-gen#methodology) proposed by Aman Rusia to get high-quality generation with memory models like Transformer-XL and XLNet (include a predefined text to make short inputs longer).
-												added best practices for serialization in README and examples

											
										
										
											2019-04-15 16:00:33 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								Here is how to run the script with the small version of OpenAI GPT-2 model:
-												added best practices for serialization in README and examples

											
										
										
											2019-04-15 16:00:33 +03:00
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								```shell
-												[doc] Fix broken links + remove crazy big notebook

											
										
										
											2020-05-08 01:44:18 +03:00
+								python ./examples/text-generation/run_generation.py \
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								    --model_type=gpt2 \
 								    --length=20 \
 								    --model_name_or_path=gpt2 \
-												added best practices for serialization in README and examples

											
										
										
											2019-04-15 16:00:33 +03:00
+								```
-												Remove trailing whitespace in README.

											
										
										
											2019-12-22 15:29:58 +03:00
+								and from the Salesforce CTRL model:
-												Adding CTRL (squashed commit)

adding conversion script

adding first draft of modeling & tokenization

adding placeholder for test files

bunch of changes

registering the tokenizer/model/etc

tests

change link; something is very VERY wrong here

weird end-of-word thingy going on

i think the tokenization works now ; wrote the unit tests

overall structure works;load w next

the monster is alive!

works after some cleanup as well

adding emacs autosave to gitignore

currently only supporting the 48 layer one; seems to infer fine on my macbook

cleanup

fixing some documentation

fixing some documentation

tests passing?

now works on CUDA also

adding greedy?

adding greedy sampling

works well

											
										
										
											2019-09-30 19:48:41 +03:00
+								```shell
-												[doc] Fix broken links + remove crazy big notebook

											
										
										
											2020-05-08 01:44:18 +03:00
+								python ./examples/text-generation/run_generation.py \
-												Adding CTRL (squashed commit)

adding conversion script

adding first draft of modeling & tokenization

adding placeholder for test files

bunch of changes

registering the tokenizer/model/etc

tests

change link; something is very VERY wrong here

weird end-of-word thingy going on

i think the tokenization works now ; wrote the unit tests

overall structure works;load w next

the monster is alive!

works after some cleanup as well

adding emacs autosave to gitignore

currently only supporting the 48 layer one; seems to infer fine on my macbook

cleanup

fixing some documentation

fixing some documentation

tests passing?

now works on CUDA also

adding greedy?

adding greedy sampling

works well

											
										
										
											2019-09-30 19:48:41 +03:00
+								    --model_type=ctrl \
 								    --length=20 \
-												[CTRL] warn if generation prompt does not start with a control code

see also https://github.com/salesforce/ctrl/pull/50

											
										
										
											2019-10-23 00:27:20 +03:00
+								    --model_name_or_path=ctrl \
-												Adding CTRL (squashed commit)

adding conversion script

adding first draft of modeling & tokenization

adding placeholder for test files

bunch of changes

registering the tokenizer/model/etc

tests

change link; something is very VERY wrong here

weird end-of-word thingy going on

i think the tokenization works now ; wrote the unit tests

overall structure works;load w next

the monster is alive!

works after some cleanup as well

adding emacs autosave to gitignore

currently only supporting the 48 layer one; seems to infer fine on my macbook

cleanup

fixing some documentation

fixing some documentation

tests passing?

now works on CUDA also

adding greedy?

adding greedy sampling

works well

											
										
										
											2019-09-30 19:48:41 +03:00
+								    --temperature=0 \
 								    --repetition_penalty=1.2 \
 								```
-												[doc] Model upload and sharing

ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?

											
										
										
											2019-12-16 20:42:22 +03:00
+								## Quick tour of model sharing
-												Doc tweak on model sharing

											
										
										
											2020-01-23 06:40:38 +03:00
+								Starting with `v2.2.2`, you can now upload and share your fine-tuned models with the community, using the <abbr title="Command-line interface">CLI</abbr> that's built-in to the library.
-												[doc] Model upload and sharing

ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?

											
										
										
											2019-12-16 20:42:22 +03:00
-												[doc] Document the new --organization flag of CLI

											
										
										
											2020-03-10 23:42:01 +03:00
+								**First, create an account on [https://huggingface.co/join](https://huggingface.co/join)**. Optionally, join an existing organization or create a new one. Then:
-												[doc] Model upload and sharing

ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?

											
										
										
											2019-12-16 20:42:22 +03:00
 								```shell
 								transformers-cli login
 								# log in using the same credentials as on huggingface.co
 								```
 								Upload your model:
 								```shell
 								transformers-cli upload ./path/to/pretrained_model/
 								# ^^ Upload folder containing weights/tokenizer/config
 								# saved via `.save_pretrained()`
-												[doc] Clarify uploads

cf https://github.com/huggingface/transformers/commit/855ff0e91d8b3bd75a3b1c1316e2efd814373764#commitcomment-36452545

											
										
										
											2019-12-17 02:20:23 +03:00
+								transformers-cli upload ./config.json [--filename folder/foobar.json]
-												[doc] Model upload and sharing

ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?

											
										
										
											2019-12-16 20:42:22 +03:00
 								# ^^ Upload a single file
-												[doc] Clarify uploads

cf https://github.com/huggingface/transformers/commit/855ff0e91d8b3bd75a3b1c1316e2efd814373764#commitcomment-36452545

											
										
										
											2019-12-17 02:20:23 +03:00
+								# (you can optionally override its filename, which can be nested inside a folder)
-												[doc] Model upload and sharing

ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?

											
										
										
											2019-12-16 20:42:22 +03:00
+								```
-												[doc] Document the new --organization flag of CLI

											
										
										
											2020-03-10 23:42:01 +03:00
+								If you want your model to be namespaced by your organization name rather than your username, add the following flag to any command:
 								```shell
 								--organization organization_name
 								```
 								Your model will then be accessible through its identifier, a concatenation of your username (or organization name) and the folder name above:
-												[doc] Model upload and sharing

ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?

											
										
										
											2019-12-16 20:42:22 +03:00
+								```python
-												[doc] --organization tweak

Co-Authored-By: Thomas Wolf <thomwolf@users.noreply.github.com>

											
										
										
											2020-03-10 23:52:44 +03:00
+								"username/pretrained_model"
 								# or if an org:
 								"organization_name/pretrained_model"
-												[doc] Model upload and sharing

ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?

											
										
										
											2019-12-16 20:42:22 +03:00
+								```
-												[doc] Document the new --organization flag of CLI

											
										
										
											2020-03-10 23:42:01 +03:00
+								**Please add a README.md model card** to the repo under `model_cards/` with: model description, training params (dataset, preprocessing, hardware used, hyperparameters), evaluation results, intended uses & limitations, etc.
-												[doc] model sharing: mention README.md + tweaks

cc @lysandrejik @thomwolf

											
										
										
											2020-02-05 22:20:03 +03:00
 								Your model now has a page on huggingface.co/models 🔥
-												[doc] Model upload and sharing

ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?

											
										
										
											2019-12-16 20:42:22 +03:00
+								Anyone can load it from code:
 								```python
-												[doc] Document the new --organization flag of CLI

											
										
										
											2020-03-10 23:42:01 +03:00
+								tokenizer = AutoTokenizer.from_pretrained("namespace/pretrained_model")
 								model = AutoModel.from_pretrained("namespace/pretrained_model")
-												[doc] Model upload and sharing

ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?

											
										
										
											2019-12-16 20:42:22 +03:00
+								```
-												[doc] model sharing: mention README.md + tweaks

cc @lysandrejik @thomwolf

											
										
										
											2020-02-05 22:20:03 +03:00
+								List all your files on S3:
-												[doc] Model upload and sharing

ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?

											
										
										
											2019-12-16 20:42:22 +03:00
+								```shell
-												[cli] Update doc

											
										
										
											2019-12-28 06:54:29 +03:00
+								transformers-cli s3 ls
-												[doc] Model upload and sharing

ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?

											
										
										
											2019-12-16 20:42:22 +03:00
+								```
-												[doc] model sharing: mention README.md + tweaks

cc @lysandrejik @thomwolf

											
										
										
											2020-02-05 22:20:03 +03:00
+								You can also delete unneeded files:
-												Doc tweak on model sharing

											
										
										
											2020-01-23 06:40:38 +03:00
 								```shell
 								transformers-cli s3 rm …
 								```
-												Added pipelines quick tour in README

											
										
										
											2019-12-20 17:52:50 +03:00
+								## Quick tour of pipelines
 								New in version `v2.3`: `Pipeline` are high-level objects which automatically handle tokenization, running your data through a transformers model
-												Remove trailing whitespace in README.

											
										
										
											2019-12-22 15:29:58 +03:00
+								and outputting the result in a structured object.
-												Added pipelines quick tour in README

											
										
										
											2019-12-20 17:52:50 +03:00
 								You can create `Pipeline` objects for the following down-stream tasks:
-												update link in readme

											
										
										
											2019-12-20 21:40:23 +03:00
-												Added pipelines quick tour in README

											
										
										
											2019-12-20 17:52:50 +03:00
+								 - `feature-extraction`: Generates a tensor representation for the input sequence
 								 - `ner`: Generates named entity mapping for each word in the input sequence.
-												Remove trailing whitespace in README.

											
										
										
											2019-12-22 15:29:58 +03:00
+								 - `sentiment-analysis`: Gives the polarity (positive / negative) of the whole input sequence.
-												fill_mask helper (#2576)

* fill_mask helper

* [poc] FillMaskPipeline

* Revert "[poc] FillMaskPipeline"

This reverts commit 67eeea55b0f97b46c2b828de0f4ee97d87338335.

* Revert "fill_mask helper"

This reverts commit cacc17b884e14bb6b07989110ffe884ad9e36eaa.

* README: clarify that Pipelines can also do text-classification

cf. question at the AI&ML meetup last week, @mfuntowicz

* Fix test: test feature-extraction pipeline

* Test tweaks

* Slight refactor of existing pipeline (in preparation of new FillMaskPipeline)

* Extraneous doc

* More robust way of doing this

@mfuntowicz as we don't rely on the model name anymore (see AutoConfig)

* Also add RobertaConfig as a quickfix for wrong token_type_ids

* cs

* [BIG] FillMaskPipeline

											
										
										
											2020-01-31 02:15:42 +03:00
+								 - `text-classification`: Initialize a `TextClassificationPipeline` directly, or see `sentiment-analysis` for an example.
 								 - `question-answering`: Provided some context and a question refering to the context, it will extract the answer to the question in the context.
 								 - `fill-mask`: Takes an input sequence containing a masked token (e.g. `<mask>`) and return list of most probable filled sequences, with their probabilities.
-												Update doc for {Summarization,Translation}Pipeline and other tweaks

											
										
										
											2020-04-07 20:44:02 +03:00
+								 - `summarization`
 								 - `translation_xx_to_yy`
-												Added pipelines quick tour in README

											
										
										
											2019-12-20 17:52:50 +03:00
 								```python
-												Update pipeline examples to doctest syntax (#5030)


											
										
										
											2020-06-17 01:14:58 +03:00
+								>>> from transformers import pipeline
-												Added pipelines quick tour in README

											
										
										
											2019-12-20 17:52:50 +03:00
 								# Allocate a pipeline for sentiment-analysis
-												Update pipeline examples to doctest syntax (#5030)


											
										
										
											2020-06-17 01:14:58 +03:00
+								>>> nlp = pipeline('sentiment-analysis')
 								>>> nlp('We are very happy to include pipeline into the transformers repository.')
 								[{'label': 'POSITIVE', 'score': 0.9978193640708923}]
-												Added pipelines quick tour in README

											
										
										
											2019-12-20 17:52:50 +03:00
 								# Allocate a pipeline for question-answering
-												Update pipeline examples to doctest syntax (#5030)


											
										
										
											2020-06-17 01:14:58 +03:00
+								>>> nlp = pipeline('question-answering')
 								>>> nlp({
 								...     'question': 'What is the name of the repository ?',
 								...     'context': 'Pipeline have been included in the huggingface/transformers repository'
 								... })
 								{'score': 0.5135612454720828, 'start': 35, 'end': 59, 'answer': 'huggingface/transformers'}
-												Added pipelines quick tour in README

											
										
										
											2019-12-20 17:52:50 +03:00
+								```
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
+								## Migrating from pytorch-transformers to transformers
 								Here is a quick summary of what you should take care of when migrating from `pytorch-transformers` to `transformers`.
 								### Positional order of some models' keywords inputs (`attention_mask`, `token_type_ids`...) changed
 								To be able to use Torchscript (see #1010, #1204 and #1195) the specific order of some models **keywords inputs** (`attention_mask`, `token_type_ids`...) has been changed.
 								If you used to call the models with keyword names for keyword arguments, e.g. `model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)`, this should not cause any change.
 								If you used to call the models with positional inputs for keyword arguments, e.g. `model(inputs_ids, attention_mask, token_type_ids)`, you may have to double check the exact order of input arguments.
-												[BIG] pytorch-transformers => transformers

											
										
										
											2019-09-26 11:15:53 +03:00
+								## Migrating from pytorch-pretrained-bert to transformers
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
-												update readme with migration change

											
										
										
											2019-09-26 13:00:38 +03:00
+								Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `transformers`.
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
 								### Models always output `tuples`
-												Rephrase forward method to reduce ambiguity

											
										
										
											2019-10-07 06:03:49 +03:00
+								The main breaking change when migrating from `pytorch-pretrained-bert` to `transformers` is that every model's forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
-												Fix syntax typo in README.md

											
										
										
											2019-10-01 21:57:18 +03:00
+								The exact content of the tuples for each model is detailed in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
 								In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
-												[BIG] pytorch-transformers => transformers

											
										
										
											2019-09-26 11:15:53 +03:00
+								Here is a `pytorch-pretrained-bert` to `transformers` conversion example for a `BertForSequenceClassification` classification model:
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
 								```python
 								# Let's load our model
 								model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
 								# If you used to have this line in pytorch-pretrained-bert:
 								loss = model(input_ids, labels=labels)
-												[BIG] pytorch-transformers => transformers

											
										
										
											2019-09-26 11:15:53 +03:00
+								# Now just use this line in transformers to extract the loss from the output tuple:
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
+								outputs = model(input_ids, labels=labels)
 								loss = outputs[0]
-												[BIG] pytorch-transformers => transformers

											
										
										
											2019-09-26 11:15:53 +03:00
+								# In transformers you can also have access to the logits:
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
+								loss, logits = outputs[:2]
-												Fixed typo in migration guide

											
										
										
											2019-08-06 21:19:14 +03:00
+								# And even the attention weights if you configure the model to output them (and other outputs too, see the docstrings and documentation)
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
+								model = BertForSequenceClassification.from_pretrained('bert-base-uncased', output_attentions=True)
 								outputs = model(input_ids, labels=labels)
 								loss, logits, attentions = outputs
 								```
-												Add small  note about the output of hidden states

											
										
										
											2019-09-27 11:01:36 +03:00
+								### Using hidden states
 								By enabling the configuration option `output_hidden_states`, it was possible to retrieve the last hidden states of the encoder. In `pytorch-transformers` as well as `transformers` the return value has changed slightly: `all_hidden_states` now also includes the hidden state of the embeddings in addition to those of the encoding layers. This allows users to easily access the embeddings final state.
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
+								### Serialization
-												Fix syntax typo in README.md

											
										
										
											2019-10-01 21:57:18 +03:00
+								Breaking change in the `from_pretrained()` method:
-												update breaking change section regarding from_pretrained keyword arguments

											
										
										
											2019-07-23 16:10:02 +03:00
-												Fix some typos in README

											
										
										
											2019-10-06 20:14:34 +03:00
+. Models are now set in evaluation mode by default when instantiated with the `from_pretrained()` method. To train them, don't forget to set them back in training mode (`model.train()`) to activate the dropout modules.
-												update breaking change section regarding from_pretrained keyword arguments

											
										
										
											2019-07-23 16:10:02 +03:00
-												Fix some typos in README

											
										
										
											2019-10-06 20:14:34 +03:00
+. The additional `*input` and `**kwargs` arguments supplied to the `from_pretrained()` method used to be directly passed to the underlying model's class `__init__()` method. They are now used to update the model configuration attribute instead, which can break derived model classes built based on the previous `BertForSequenceClassification` examples. We are working on a way to mitigate this breaking change in [#866](https://github.com/huggingface/transformers/pull/866) by forwarding the the model's `__init__()` method (i) the provided positional arguments and (ii) the keyword arguments which do not match any configuration class attributes.
-												indicate default evaluation in breaking changes

											
										
										
											2019-07-16 16:45:58 +03:00
-												typos

											
										
										
											2019-07-16 22:21:03 +03:00
+								Also, while not a breaking change, the serialization methods have been standardized and you probably should switch to the new method `save_pretrained(save_directory)` if you were using any other serialization method before.
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
 								Here is an example:
 								```python
 								### Let's load a model and tokenizer
 								model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
 								tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
 								### Do some stuff to our model and tokenizer
 								# Ex: add new tokens to the vocabulary and embeddings of our model
 								tokenizer.add_tokens(['[SPECIAL_TOKEN_1]', '[SPECIAL_TOKEN_2]'])
 								model.resize_token_embeddings(len(tokenizer))
 								# Train our model
 								train(model)
 								### Now let's save our model and tokenizer to a directory
 								model.save_pretrained('./my_saved_model_directory/')
 								tokenizer.save_pretrained('./my_saved_model_directory/')
 								### Reload the model and the tokenizer
 								model = BertForSequenceClassification.from_pretrained('./my_saved_model_directory/')
 								tokenizer = BertTokenizer.from_pretrained('./my_saved_model_directory/')
 								```
 								### Optimizers: BertAdam & OpenAIAdam are now AdamW, schedules are standard PyTorch schedules
-												cleaning up tokenizer tests structure (at last) - last remaining ppb refs

											
										
										
											2019-08-05 15:08:56 +03:00
+								The two optimizers previously included, `BertAdam` and `OpenAIAdam`, have been replaced by a single `AdamW` optimizer which has a few differences:
 								- it only implements weights decay correction,
 								- schedules are now externals (see below),
 								- gradient clipping is now also external (see below).
 								The new optimizer `AdamW` matches PyTorch `Adam` optimizer API and let you use standard PyTorch or apex methods for the schedule and clipping.
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
 								The schedules are now standard [PyTorch learning rate schedulers](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) and not part of the optimizer anymore.
 								Here is a conversion examples from `BertAdam` with a linear warmup and decay schedule to `AdamW` and the same schedule:
 								```python
 								# Parameters:
 								lr = 1e-3
-												cleaning up tokenizer tests structure (at last) - last remaining ppb refs

											
										
										
											2019-08-05 15:08:56 +03:00
+								max_grad_norm = 1.0
-												update the examples, docs and template

											
										
										
											2019-11-14 20:00:14 +03:00
+								num_training_steps = 1000
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
+								num_warmup_steps = 100
-												update the examples, docs and template

											
										
										
											2019-11-14 20:00:14 +03:00
+								warmup_proportion = float(num_warmup_steps) / float(num_training_steps)  # 0.1
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
 								### Previously BertAdam optimizer was instantiated like this:
-												update the examples, docs and template

											
										
										
											2019-11-14 20:00:14 +03:00
+								optimizer = BertAdam(model.parameters(), lr=lr, schedule='warmup_linear', warmup=warmup_proportion, t_total=num_training_steps)
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
+								### and used like this:
 								for batch in train_data:
 								    loss = model(batch)
 								    loss.backward()
 								    optimizer.step()
-												[BIG] pytorch-transformers => transformers

											
										
										
											2019-09-26 11:15:53 +03:00
+								### In Transformers, optimizer and schedules are splitted and instantiated like this:
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
+								optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False)  # To reproduce BertAdam specific behavior set correct_bias=False
-												update the examples, docs and template

											
										
										
											2019-11-14 20:00:14 +03:00
+								scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=num_warmup_steps, num_training_steps=num_training_steps)  # PyTorch scheduler
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
+								### and used like this:
 								for batch in train_data:
-												Add `model.train()` line to ReadMe training example

Co-Authored-By: Santosh-Gupta <San.Gupta.ML@gmail.com>

											
										
										
											2019-11-04 19:52:35 +03:00
+								    model.train()
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
+								    loss = model(batch)
 								    loss.backward()
-												cleaning up tokenizer tests structure (at last) - last remaining ppb refs

											
										
										
											2019-08-05 15:08:56 +03:00
+								    torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)  # Gradient clipping is not in AdamW anymore (so you can use amp without issue)
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
+								    optimizer.step()
-												fix #1017

											
										
										
											2019-08-21 23:22:17 +03:00
+								    scheduler.step()
-												fix #944

											
										
										
											2019-08-05 18:16:56 +03:00
+								    optimizer.zero_grad()
-												added migration guide to readme

											
										
										
											2019-07-16 10:03:49 +03:00
+								```
-												update readme

											
										
										
											2019-07-16 01:12:55 +03:00
+								## Citation
-												updating readme and notebooks

											
										
										
											2018-11-16 16:31:15 +03:00
-												adding citation

											
										
										
											2019-10-11 17:18:16 +03:00
+								We now have a paper you can cite for the 🤗 Transformers library:
-												Add syntax highlighting to the BibTeX in README
											
										
										
											2020-02-20 00:49:31 +03:00
+								```bibtex
-												Fix citation

											
										
										
											2019-10-21 17:34:14 +03:00
+								@article{Wolf2019HuggingFacesTS,
 								  title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
 								  author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},
 								  journal={ArXiv},
 								  year={2019},
 								  volume={abs/1910.03771}
-												adding citation

											
										
										
											2019-10-11 17:18:16 +03:00
+								}
 								```