From 793d31fc6f7626f54bbc776f42ca5d9b2b2edd41 Mon Sep 17 00:00:00 2001
From: Pengcheng He <penhe@microsoft.com>
Date: Mon, 3 May 2021 20:50:38 -0400
Subject: [PATCH] Add example and document for SiFT

---
 DeBERTa/sift/README.md   | 31 +++++++++++++++++++++++++++++++
 experiments/glue/mnli.sh | 21 +++++++++++++++++++++
 2 files changed, 52 insertions(+)
 create mode 100644 DeBERTa/sift/README.md
diff --git a/DeBERTa/sift/README.md b/DeBERTa/sift/README.md
new file mode 100644
index 0000000..adf46e4
--- /dev/null
+++ b/DeBERTa/sift/README.md
@@ -0,0 +1,31 @@
+# SiFT (Scale Invariant Fine-Tuning) 
+
+## Usage
+
+For example to try SiFT in DeBERTa, please check `experiments/glue/mnli.sh base-sift` or `experiments/glue/mnli.sh xxlarge-v2-sift`
+
+
+Here is an example to consume SiFT in your existing code,
+
+  ```python
+  # Create DeBERTa model
+  adv_modules = hook_sift_layer(model, hidden_size=768)
+  adv = AdversarialLearner(model, adv_modules)
+  def logits_fn(model, *wargs, **kwargs):
+    logits,_ = model(*wargs, **kwargs)
+    return logits
+  logits,loss = model(**data)
+
+  loss = loss + adv.loss(logits, logits_fn, **data)
+  # Other steps is the same as general training.
+
+  ```
+
+## Ablation study results
+
+
+| Model                     |  MNLI-m/mm   | SST-2 | QNLI | CoLA | RTE    | MRPC  | QQP   |STS-B |
+|---------------------------|-------------|-------|------|------|--------|-------|-------|------|
+|                           |  Acc         | Acc   | Acc  | MCC  | Acc    |Acc/F1 |Acc/F1 |P/S   |
+|**[DeBERTa-V2-XXLarge](https://huggingface.co/microsoft/deberta-v2-xxlarge)<sup>1,2</sup>**|91.7/91.9|97.2|96.0|72.0| 93.5| **93.1/94.9**|92.7/90.3 |93.2/93.1 |
+|**[DeBERTa-V2-XXLarge](https://huggingface.co/microsoft/deberta-v2-xxlarge)<sup>1,2</sup>**|**92.0/92.1**|97.5|**96.5**|**73.5**| **96.5**| - |**93.0/90.7** | - |
diff --git a/experiments/glue/mnli.sh b/experiments/glue/mnli.sh
index 74d8a1f..7af6b23 100755
--- a/experiments/glue/mnli.sh
+++ b/experiments/glue/mnli.sh
@@ -21,6 +21,15 @@ init=$1
 tag=$init
 case ${init,,} in
 	base)
+	parameters=" --num_train_epochs 3 \
+	--fp16 True \
+	--warmup 1000 \
+	--learning_rate 2e-5 \
+	--train_batch_size 64 \
+	--cls_drop_out 0.1 "
+		;;
+	base-sift)
+  init=base
 	parameters=" --num_train_epochs 6 \
 	--vat_lambda 5 \
 	--vat_learning_rate 1e-4 \
@@ -61,6 +70,18 @@ case ${init,,} in
 	--learning_rate 3e-6 \
 	--train_batch_size 64 \
 	--cls_drop_out 0.3 \
+	--fp16 True "
+		;;
+	xxlarge-v2-sift)
+	init=xxlarge-v2
+	parameters=" --num_train_epochs 6 \
+	--warmup 1000 \
+	--vat_lambda 5 \
+	--vat_learning_rate 1e-4 \
+	--vat_init_perturbation 1e-2 \
+	--learning_rate 3e-6 \
+	--train_batch_size 64 \
+	--cls_drop_out 0.3 \
 	--fp16 True "
 		;;
 	*)