2020-05-07 20:48:44 +03:00
|
|
|
|
|
|
|
|
|
|
|
## SQuAD
|
|
|
|
|
2020-05-08 01:44:18 +03:00
|
|
|
Based on the script [`run_squad.py`](https://github.com/huggingface/transformers/blob/master/examples/question-answering/run_squad.py).
|
2020-05-07 20:48:44 +03:00
|
|
|
|
|
|
|
#### Fine-tuning BERT on SQuAD1.0
|
|
|
|
|
|
|
|
This example code fine-tunes BERT on the SQuAD1.0 dataset. It runs in 24 min (with BERT-base) or 68 min (with BERT-large)
|
|
|
|
on a single tesla V100 16GB. The data for SQuAD can be downloaded with the following links and should be saved in a
|
|
|
|
$SQUAD_DIR directory.
|
|
|
|
|
|
|
|
* [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json)
|
|
|
|
* [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json)
|
|
|
|
* [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py)
|
|
|
|
|
|
|
|
And for SQuAD2.0, you need to download:
|
|
|
|
|
|
|
|
- [train-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json)
|
|
|
|
- [dev-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json)
|
|
|
|
- [evaluate-v2.0.py](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/)
|
|
|
|
|
|
|
|
```bash
|
|
|
|
export SQUAD_DIR=/path/to/SQUAD
|
|
|
|
|
|
|
|
python run_squad.py \
|
|
|
|
--model_type bert \
|
|
|
|
--model_name_or_path bert-base-uncased \
|
|
|
|
--do_train \
|
|
|
|
--do_eval \
|
|
|
|
--train_file $SQUAD_DIR/train-v1.1.json \
|
|
|
|
--predict_file $SQUAD_DIR/dev-v1.1.json \
|
|
|
|
--per_gpu_train_batch_size 12 \
|
|
|
|
--learning_rate 3e-5 \
|
|
|
|
--num_train_epochs 2.0 \
|
|
|
|
--max_seq_length 384 \
|
|
|
|
--doc_stride 128 \
|
|
|
|
--output_dir /tmp/debug_squad/
|
|
|
|
```
|
|
|
|
|
|
|
|
Training with the previously defined hyper-parameters yields the following results:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
f1 = 88.52
|
|
|
|
exact_match = 81.22
|
|
|
|
```
|
|
|
|
|
|
|
|
#### Distributed training
|
|
|
|
|
|
|
|
|
|
|
|
Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD1.1:
|
|
|
|
|
|
|
|
```bash
|
2020-05-08 01:44:18 +03:00
|
|
|
python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_squad.py \
|
2020-05-07 20:48:44 +03:00
|
|
|
--model_type bert \
|
|
|
|
--model_name_or_path bert-large-uncased-whole-word-masking \
|
|
|
|
--do_train \
|
|
|
|
--do_eval \
|
|
|
|
--train_file $SQUAD_DIR/train-v1.1.json \
|
|
|
|
--predict_file $SQUAD_DIR/dev-v1.1.json \
|
|
|
|
--learning_rate 3e-5 \
|
|
|
|
--num_train_epochs 2 \
|
|
|
|
--max_seq_length 384 \
|
|
|
|
--doc_stride 128 \
|
|
|
|
--output_dir ./examples/models/wwm_uncased_finetuned_squad/ \
|
|
|
|
--per_gpu_eval_batch_size=3 \
|
|
|
|
--per_gpu_train_batch_size=3 \
|
|
|
|
```
|
|
|
|
|
|
|
|
Training with the previously defined hyper-parameters yields the following results:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
f1 = 93.15
|
|
|
|
exact_match = 86.91
|
|
|
|
```
|
|
|
|
|
|
|
|
This fine-tuned model is available as a checkpoint under the reference
|
|
|
|
`bert-large-uncased-whole-word-masking-finetuned-squad`.
|
|
|
|
|
|
|
|
#### Fine-tuning XLNet on SQuAD
|
|
|
|
|
|
|
|
This example code fine-tunes XLNet on both SQuAD1.0 and SQuAD2.0 dataset. See above to download the data for SQuAD .
|
|
|
|
|
|
|
|
##### Command for SQuAD1.0:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
export SQUAD_DIR=/path/to/SQUAD
|
|
|
|
|
|
|
|
python run_squad.py \
|
|
|
|
--model_type xlnet \
|
|
|
|
--model_name_or_path xlnet-large-cased \
|
|
|
|
--do_train \
|
|
|
|
--do_eval \
|
|
|
|
--train_file $SQUAD_DIR/train-v1.1.json \
|
|
|
|
--predict_file $SQUAD_DIR/dev-v1.1.json \
|
|
|
|
--learning_rate 3e-5 \
|
|
|
|
--num_train_epochs 2 \
|
|
|
|
--max_seq_length 384 \
|
|
|
|
--doc_stride 128 \
|
|
|
|
--output_dir ./wwm_cased_finetuned_squad/ \
|
|
|
|
--per_gpu_eval_batch_size=4 \
|
|
|
|
--per_gpu_train_batch_size=4 \
|
|
|
|
--save_steps 5000
|
|
|
|
```
|
|
|
|
|
|
|
|
##### Command for SQuAD2.0:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
export SQUAD_DIR=/path/to/SQUAD
|
|
|
|
|
|
|
|
python run_squad.py \
|
|
|
|
--model_type xlnet \
|
|
|
|
--model_name_or_path xlnet-large-cased \
|
|
|
|
--do_train \
|
|
|
|
--do_eval \
|
|
|
|
--version_2_with_negative \
|
|
|
|
--train_file $SQUAD_DIR/train-v2.0.json \
|
|
|
|
--predict_file $SQUAD_DIR/dev-v2.0.json \
|
|
|
|
--learning_rate 3e-5 \
|
|
|
|
--num_train_epochs 4 \
|
|
|
|
--max_seq_length 384 \
|
|
|
|
--doc_stride 128 \
|
|
|
|
--output_dir ./wwm_cased_finetuned_squad/ \
|
|
|
|
--per_gpu_eval_batch_size=2 \
|
|
|
|
--per_gpu_train_batch_size=2 \
|
|
|
|
--save_steps 5000
|
|
|
|
```
|
|
|
|
|
|
|
|
Larger batch size may improve the performance while costing more memory.
|
|
|
|
|
|
|
|
##### Results for SQuAD1.0 with the previously defined hyper-parameters:
|
|
|
|
|
|
|
|
```python
|
|
|
|
{
|
|
|
|
"exact": 85.45884578997162,
|
|
|
|
"f1": 92.5974600601065,
|
|
|
|
"total": 10570,
|
|
|
|
"HasAns_exact": 85.45884578997162,
|
|
|
|
"HasAns_f1": 92.59746006010651,
|
|
|
|
"HasAns_total": 10570
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
##### Results for SQuAD2.0 with the previously defined hyper-parameters:
|
|
|
|
|
|
|
|
```python
|
|
|
|
{
|
|
|
|
"exact": 80.4177545691906,
|
|
|
|
"f1": 84.07154997729623,
|
|
|
|
"total": 11873,
|
|
|
|
"HasAns_exact": 76.73751686909581,
|
|
|
|
"HasAns_f1": 84.05558584352873,
|
|
|
|
"HasAns_total": 5928,
|
|
|
|
"NoAns_exact": 84.0874684608915,
|
|
|
|
"NoAns_f1": 84.0874684608915,
|
|
|
|
"NoAns_total": 5945
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|