From 793d31fc6f7626f54bbc776f42ca5d9b2b2edd41 Mon Sep 17 00:00:00 2001 From: Pengcheng He Date: Mon, 3 May 2021 20:50:38 -0400 Subject: [PATCH] Add example and document for SiFT --- DeBERTa/sift/README.md | 31 +++++++++++++++++++++++++++++++ experiments/glue/mnli.sh | 21 +++++++++++++++++++++ 2 files changed, 52 insertions(+) create mode 100644 DeBERTa/sift/README.md diff --git a/DeBERTa/sift/README.md b/DeBERTa/sift/README.md new file mode 100644 index 0000000..adf46e4 --- /dev/null +++ b/DeBERTa/sift/README.md @@ -0,0 +1,31 @@ +# SiFT (Scale Invariant Fine-Tuning) + +## Usage + +For example to try SiFT in DeBERTa, please check `experiments/glue/mnli.sh base-sift` or `experiments/glue/mnli.sh xxlarge-v2-sift` + + +Here is an example to consume SiFT in your existing code, + + ```python + # Create DeBERTa model + adv_modules = hook_sift_layer(model, hidden_size=768) + adv = AdversarialLearner(model, adv_modules) + def logits_fn(model, *wargs, **kwargs): + logits,_ = model(*wargs, **kwargs) + return logits + logits,loss = model(**data) + + loss = loss + adv.loss(logits, logits_fn, **data) + # Other steps is the same as general training. + + ``` + +## Ablation study results + + +| Model | MNLI-m/mm | SST-2 | QNLI | CoLA | RTE | MRPC | QQP |STS-B | +|---------------------------|-------------|-------|------|------|--------|-------|-------|------| +| | Acc | Acc | Acc | MCC | Acc |Acc/F1 |Acc/F1 |P/S | +|**[DeBERTa-V2-XXLarge](https://huggingface.co/microsoft/deberta-v2-xxlarge)1,2**|91.7/91.9|97.2|96.0|72.0| 93.5| **93.1/94.9**|92.7/90.3 |93.2/93.1 | +|**[DeBERTa-V2-XXLarge](https://huggingface.co/microsoft/deberta-v2-xxlarge)1,2**|**92.0/92.1**|97.5|**96.5**|**73.5**| **96.5**| - |**93.0/90.7** | - | diff --git a/experiments/glue/mnli.sh b/experiments/glue/mnli.sh index 74d8a1f..7af6b23 100755 --- a/experiments/glue/mnli.sh +++ b/experiments/glue/mnli.sh @@ -21,6 +21,15 @@ init=$1 tag=$init case ${init,,} in base) + parameters=" --num_train_epochs 3 \ + --fp16 True \ + --warmup 1000 \ + --learning_rate 2e-5 \ + --train_batch_size 64 \ + --cls_drop_out 0.1 " + ;; + base-sift) + init=base parameters=" --num_train_epochs 6 \ --vat_lambda 5 \ --vat_learning_rate 1e-4 \ @@ -61,6 +70,18 @@ case ${init,,} in --learning_rate 3e-6 \ --train_batch_size 64 \ --cls_drop_out 0.3 \ + --fp16 True " + ;; + xxlarge-v2-sift) + init=xxlarge-v2 + parameters=" --num_train_epochs 6 \ + --warmup 1000 \ + --vat_lambda 5 \ + --vat_learning_rate 1e-4 \ + --vat_init_perturbation 1e-2 \ + --learning_rate 3e-6 \ + --train_batch_size 64 \ + --cls_drop_out 0.3 \ --fp16 True " ;; *)