* Add reference to NLP dataset

* Update README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
This commit is contained in:
Manuel Romero 2020-06-16 10:19:09 +02:00 коммит произвёл GitHub
Родитель 0946d1209d
Коммит 0c55a384f8
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
1 изменённых файлов: 12 добавлений и 5 удалений

Просмотреть файл

@ -1,6 +1,7 @@
---
language: english
thumbnail:
datasets:
- squad_v2
---
# T5-base fine-tuned on SQuAD v2
@ -16,13 +17,19 @@ Transfer learning, where a model is first pre-trained on a data-rich task before
## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓
[SQuAD v2](https://rajpurkar.github.io/SQuAD-explorer/) combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.
Dataset ID: ```squad_v2``` from [HugginFace/NLP](https://github.com/huggingface/nlp)
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| SQuAD2.0 | train | 130k |
| SQuAD2.0 | eval | 12.3k |
| squad_v2 | train | 130319 |
| squad_v2 | valid | 11873 |
How to load it from [nlp](https://github.com/huggingface/nlp)
```python
train_dataset = nlp.load_dataset('squad_v2', split=nlp.Split.TRAIN)
valid_dataset = nlp.load_dataset('squad_v2', split=nlp.Split.VALIDATION)
```
Check out more about this dataset and others in [NLP Viewer](https://huggingface.co/nlp/viewer/)
## Model fine-tuning 🏋️‍