зеркало из https://github.com/microsoft/DeBERTa.git
Add example and document for SiFT
This commit is contained in:
Родитель
14bb78d123
Коммит
793d31fc6f
|
@ -0,0 +1,31 @@
|
|||
# SiFT (Scale Invariant Fine-Tuning)
|
||||
|
||||
## Usage
|
||||
|
||||
For example to try SiFT in DeBERTa, please check `experiments/glue/mnli.sh base-sift` or `experiments/glue/mnli.sh xxlarge-v2-sift`
|
||||
|
||||
|
||||
Here is an example to consume SiFT in your existing code,
|
||||
|
||||
```python
|
||||
# Create DeBERTa model
|
||||
adv_modules = hook_sift_layer(model, hidden_size=768)
|
||||
adv = AdversarialLearner(model, adv_modules)
|
||||
def logits_fn(model, *wargs, **kwargs):
|
||||
logits,_ = model(*wargs, **kwargs)
|
||||
return logits
|
||||
logits,loss = model(**data)
|
||||
|
||||
loss = loss + adv.loss(logits, logits_fn, **data)
|
||||
# Other steps is the same as general training.
|
||||
|
||||
```
|
||||
|
||||
## Ablation study results
|
||||
|
||||
|
||||
| Model | MNLI-m/mm | SST-2 | QNLI | CoLA | RTE | MRPC | QQP |STS-B |
|
||||
|---------------------------|-------------|-------|------|------|--------|-------|-------|------|
|
||||
| | Acc | Acc | Acc | MCC | Acc |Acc/F1 |Acc/F1 |P/S |
|
||||
|**[DeBERTa-V2-XXLarge](https://huggingface.co/microsoft/deberta-v2-xxlarge)<sup>1,2</sup>**|91.7/91.9|97.2|96.0|72.0| 93.5| **93.1/94.9**|92.7/90.3 |93.2/93.1 |
|
||||
|**[DeBERTa-V2-XXLarge](https://huggingface.co/microsoft/deberta-v2-xxlarge)<sup>1,2</sup>**|**92.0/92.1**|97.5|**96.5**|**73.5**| **96.5**| - |**93.0/90.7** | - |
|
|
@ -21,6 +21,15 @@ init=$1
|
|||
tag=$init
|
||||
case ${init,,} in
|
||||
base)
|
||||
parameters=" --num_train_epochs 3 \
|
||||
--fp16 True \
|
||||
--warmup 1000 \
|
||||
--learning_rate 2e-5 \
|
||||
--train_batch_size 64 \
|
||||
--cls_drop_out 0.1 "
|
||||
;;
|
||||
base-sift)
|
||||
init=base
|
||||
parameters=" --num_train_epochs 6 \
|
||||
--vat_lambda 5 \
|
||||
--vat_learning_rate 1e-4 \
|
||||
|
@ -61,6 +70,18 @@ case ${init,,} in
|
|||
--learning_rate 3e-6 \
|
||||
--train_batch_size 64 \
|
||||
--cls_drop_out 0.3 \
|
||||
--fp16 True "
|
||||
;;
|
||||
xxlarge-v2-sift)
|
||||
init=xxlarge-v2
|
||||
parameters=" --num_train_epochs 6 \
|
||||
--warmup 1000 \
|
||||
--vat_lambda 5 \
|
||||
--vat_learning_rate 1e-4 \
|
||||
--vat_init_perturbation 1e-2 \
|
||||
--learning_rate 3e-6 \
|
||||
--train_batch_size 64 \
|
||||
--cls_drop_out 0.3 \
|
||||
--fp16 True "
|
||||
;;
|
||||
*)
|
||||
|
|
Загрузка…
Ссылка в новой задаче