Add example and document for SiFT

2021-05-03 20:50:38 -04:00 · 2021-05-03 20:50:38 -04:00 · 793d31fc6f
--- a/DeBERTa/sift/README.md
+++ b/DeBERTa/sift/README.md
@ -0,0 +1,31 @@
+# SiFT (Scale Invariant Fine-Tuning) 
+
+## Usage
+
+For example to try SiFT in DeBERTa, please check `experiments/glue/mnli.sh base-sift` or `experiments/glue/mnli.sh xxlarge-v2-sift`
+
+
+Here is an example to consume SiFT in your existing code,
+
+  ```python
+  # Create DeBERTa model
+  adv_modules = hook_sift_layer(model, hidden_size=768)
+  adv = AdversarialLearner(model, adv_modules)
+  def logits_fn(model, *wargs, **kwargs):
+    logits,_ = model(*wargs, **kwargs)
+    return logits
+  logits,loss = model(**data)
+
+  loss = loss + adv.loss(logits, logits_fn, **data)
+  # Other steps is the same as general training.
+
+  ```
+
+## Ablation study results
+
+
+| Model                     |  MNLI-m/mm   | SST-2 | QNLI | CoLA | RTE    | MRPC  | QQP   |STS-B |
+|---------------------------|-------------|-------|------|------|--------|-------|-------|------|
+|                           |  Acc         | Acc   | Acc  | MCC  | Acc    |Acc/F1 |Acc/F1 |P/S   |
+|**[DeBERTa-V2-XXLarge](https://huggingface.co/microsoft/deberta-v2-xxlarge)<sup>1,2</sup>**|91.7/91.9|97.2|96.0|72.0| 93.5| **93.1/94.9**|92.7/90.3 |93.2/93.1 |
+|**[DeBERTa-V2-XXLarge](https://huggingface.co/microsoft/deberta-v2-xxlarge)<sup>1,2</sup>**|**92.0/92.1**|97.5|**96.5**|**73.5**| **96.5**| - |**93.0/90.7** | - |
--- a/experiments/glue/mnli.sh
+++ b/experiments/glue/mnli.sh
@ -21,6 +21,15 @@ init=$1
 tag=$init
 case ${init,,} in
 	base)
+	parameters=" --num_train_epochs 3 \
+	--fp16 True \
+	--warmup 1000 \
+	--learning_rate 2e-5 \
+	--train_batch_size 64 \
+	--cls_drop_out 0.1 "
+		;;
+	base-sift)
+  init=base
 	parameters=" --num_train_epochs 6 \
 	--vat_lambda 5 \
 	--vat_learning_rate 1e-4 \
@ -61,6 +70,18 @@ case ${init,,} in
 	--learning_rate 3e-6 \
 	--train_batch_size 64 \
 	--cls_drop_out 0.3 \
+	--fp16 True "
+		;;
+	xxlarge-v2-sift)
+	init=xxlarge-v2
+	parameters=" --num_train_epochs 6 \
+	--warmup 1000 \
+	--vat_lambda 5 \
+	--vat_learning_rate 1e-4 \
+	--vat_init_perturbation 1e-2 \
+	--learning_rate 3e-6 \
+	--train_batch_size 64 \
+	--cls_drop_out 0.3 \
 	--fp16 True "
 		;;
 	*)