Add example and document for SiFT

This commit is contained in:
Pengcheng He 2021-05-03 20:50:38 -04:00 коммит произвёл Pengcheng He
Родитель 14bb78d123
Коммит 793d31fc6f
2 изменённых файлов: 52 добавлений и 0 удалений

31
DeBERTa/sift/README.md Normal file
Просмотреть файл

@ -0,0 +1,31 @@
# SiFT (Scale Invariant Fine-Tuning)
## Usage
For example to try SiFT in DeBERTa, please check `experiments/glue/mnli.sh base-sift` or `experiments/glue/mnli.sh xxlarge-v2-sift`
Here is an example to consume SiFT in your existing code,
```python
# Create DeBERTa model
adv_modules = hook_sift_layer(model, hidden_size=768)
adv = AdversarialLearner(model, adv_modules)
def logits_fn(model, *wargs, **kwargs):
logits,_ = model(*wargs, **kwargs)
return logits
logits,loss = model(**data)
loss = loss + adv.loss(logits, logits_fn, **data)
# Other steps is the same as general training.
```
## Ablation study results
| Model | MNLI-m/mm | SST-2 | QNLI | CoLA | RTE | MRPC | QQP |STS-B |
|---------------------------|-------------|-------|------|------|--------|-------|-------|------|
| | Acc | Acc | Acc | MCC | Acc |Acc/F1 |Acc/F1 |P/S |
|**[DeBERTa-V2-XXLarge](https://huggingface.co/microsoft/deberta-v2-xxlarge)<sup>1,2</sup>**|91.7/91.9|97.2|96.0|72.0| 93.5| **93.1/94.9**|92.7/90.3 |93.2/93.1 |
|**[DeBERTa-V2-XXLarge](https://huggingface.co/microsoft/deberta-v2-xxlarge)<sup>1,2</sup>**|**92.0/92.1**|97.5|**96.5**|**73.5**| **96.5**| - |**93.0/90.7** | - |

Просмотреть файл

@ -21,6 +21,15 @@ init=$1
tag=$init
case ${init,,} in
base)
parameters=" --num_train_epochs 3 \
--fp16 True \
--warmup 1000 \
--learning_rate 2e-5 \
--train_batch_size 64 \
--cls_drop_out 0.1 "
;;
base-sift)
init=base
parameters=" --num_train_epochs 6 \
--vat_lambda 5 \
--vat_learning_rate 1e-4 \
@ -61,6 +70,18 @@ case ${init,,} in
--learning_rate 3e-6 \
--train_batch_size 64 \
--cls_drop_out 0.3 \
--fp16 True "
;;
xxlarge-v2-sift)
init=xxlarge-v2
parameters=" --num_train_epochs 6 \
--warmup 1000 \
--vat_lambda 5 \
--vat_learning_rate 1e-4 \
--vat_init_perturbation 1e-2 \
--learning_rate 3e-6 \
--train_batch_size 64 \
--cls_drop_out 0.3 \
--fp16 True "
;;
*)