12 KiB
Multi-Task Deep Neural Networks for Natural Language Understanding
This PyTorch package implements the Multi-Task Deep Neural Networks (MT-DNN) for Natural Language Understanding, as described in:
Xiaodong Liu*, Pengcheng He*, Weizhu Chen and Jianfeng Gao
Multi-Task Deep Neural Networks for Natural Language Understanding
ACL 2019
*: Equal contribution
Xiaodong Liu, Pengcheng He, Weizhu Chen and Jianfeng Gao
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
arXiv version
Pengcheng He, Xiaodong Liu, Weizhu Chen and Jianfeng Gao
Hybrid Neural Network Model for Commonsense Reasoning
arXiv version
Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao and Jiawei Han
On the Variance of the Adaptive Learning Rate and Beyond
arXiv version
Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao and Tuo Zhao
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
arXiv version
Pip install package
A setup.py file is provided in order to simplify the installation of this package.
-
To install the package, please run the command below (from directory root)
pip install -e .
-
Running the command tells pip to install the
mt-dnn
package from source in development mode. This just means that any updates tomt-dnn
source directory will immediately be reflected in the installed package without needing to reinstall; a very useful practice for a package with constant updates. -
It is also possible to install directly from Github, which is the best way to utilize the package in external projects (while still reflecting updates to the source as it's installed as an editable '-e' package).
pip install -e git+git@github.com:microsoft/mt-dnn.git@master#egg=mtdnn
-
Either command, from above, makes
mt-dnn
available in your conda virtual environment. You can verify it was properly installed by running:pip list | grep mtdnn
How To Use
-
Create a model configuration object,
MTDNNConfig
, with the necessary parameters to initialize the MT-DNN model. Initialization without any parameters will default to a similar configuration that initializes a BERT model. This configuration object can be initialized wit training and learning parameters likebatch_size
andlearning_rate
. Please consult the class implementation for all parameters.BATCH_SIZE = 16 config = MTDNNConfig(batch_size=BATCH_SIZE)
-
Define the task parameters to train for and initialize an
MTDNNTaskDefs
object.tasks_params = { "mnli": { "data_format": "PremiseAndOneHypothesis", "encoder_type": "BERT", "dropout_p": 0.3, "enable_san": True, "labels": ["contradiction", "neutral", "entailment"], "metric_meta": ["ACC"], "loss": "CeCriterion", "kd_loss": "MseCriterion", "n_class": 3, "split_names": [ "train", "matched_dev", "mismatched_dev", "matched_test", "mismatched_test", ], "task_type": "Classification", }, } task_defs = MTDNNTaskDefs(tasks_params)
-
Create a data preprocessing object,
MTDNNDataProcess
. This creates the training, test and development PyTorch dataloaders needed for training and testing. We also need to retrieve the necessary training options required to initialize the model correctly, for all tasks.data_processor = MTDNNDataProcess( config=config, task_defs=task_defs, data_dir="/home/useradmin/sources/mt-dnn/data/canonical_data/bert_uncased_lower", train_datasets_list=["mnli"], test_datasets_list=["mnli_mismatched", "mnli_matched"], ) # Retrieve the multi task train, dev and test dataloaders multitask_train_dataloader = data_processor.get_train_dataloader() dev_dataloaders_list = data_processor.get_dev_dataloaders() test_dataloaders_list = data_processor.get_test_dataloaders() # Get training options to initialize model decoder_opts = data_processor.get_decoder_options_list() task_types = data_processor.get_task_types_list() dropout_list = data_processor.get_tasks_dropout_prob_list() loss_types = data_processor.get_loss_types_list() kd_loss_types = data_processor.get_kd_loss_types_list() tasks_nclass_list = data_processor.get_task_nclass_list() num_all_batches = data_processor.get_num_all_batches()
-
Now we can create an
MTDNNModel
.model = MTDNNModel( config, task_defs, pretrained_model_name="bert-base-uncased", num_train_step=num_all_batches, decoder_opts=decoder_opts, task_types=task_types, dropout_list=dropout_list, loss_types=loss_types, kd_loss_types=kd_loss_types, tasks_nclass_list=tasks_nclass_list, multitask_train_dataloader=multitask_train_dataloader, dev_dataloaders_list=dev_dataloaders_list, test_dataloaders_list=test_dataloaders_list, )
-
At this point the MT-DNN model allows us to fit to the model and create predictions. The fit takes an optional
epochs
parameter that overwrites the epochs set in theMTDNNConfig
object.model.fit() model.predict()
-
The predict function can take an optional checkpoint,
trained_model_chckpt
. This can be used for inference and running evaluations on an already trained PyTorch MT-DNN model.
Optionally using a previously trained model as checkpoint.# Predict using a PyTorch model checkpoint checkpt = "./model_0.pt" model.predict(trained_model_chckpt=checkpt)
Pre-process your data in the correct format
Depending on what data_format
you have set in the configuration object MTDNNConfig
, please follow the detailed data format below to prepare your data:
-
PremiseOnly
: single text, i.e. premise. Data format is "id" \t "label" \t "premise" . -
PremiseAndOneHypothesis
: two texts, i.e. one premise and one hypothesis. Data format is "id" \t "label" \t "premise" \t "hypothesis". -
PremiseAndMultiHypothesis
: one text as premise and multiple candidates of texts as hypothesis. Data format is "id" \t "label" \t "premise" \t "hypothesis_1" \t "hypothesis_2" \t ... \t "hypothesis_n". -
Sequence
: sequence tagging. Data format is "id" \t "label" \t "premise".
FAQ
Did you share the pretrained mt-dnn models?
Yes, we released the pretrained shared embedings via MTL which are aligned to BERT base/large models: mt_dnn_base.pt
and mt_dnn_large.pt
.
How can we obtain the data and pre-trained models to test to try out?
Yes, we have provided a download script to assist with this.
Why SciTail/SNLI do not enable SAN?
For SciTail/SNLI tasks, the purpose is to test generalization of the learned embedding and how easy it is adapted to a new domain instead of complicated model structures for a direct comparison with BERT. Thus, we use a linear projection on the all domain adaptation settings.
What is the difference between V1 and V2
The difference is in the QNLI dataset. Please refere to the GLUE official homepage for more details. If you want to formulate QNLI as pair-wise ranking task as our paper, make sure that you use the old QNLI data.
Then run the prepro script with flags: > sh experiments/glue/prepro.sh --old_glue
If you have issues to access the old version of the data, please contact the GLUE team.
Did you fine-tune single task for your GLUE leaderboard submission?
We can use the multi-task refinement model to run the prediction and produce a reasonable result. But to achieve a better result, it requires a fine-tuneing on each task. It is worthing noting the paper in arxiv is a littled out-dated and on the old GLUE dataset. We will update the paper as we mentioned below.
Notes and Acknowledgments
BERT pytorch is from: https://github.com/huggingface/pytorch-pretrained-BERT
BERT: https://github.com/google-research/bert
We also used some code from: https://github.com/kevinduh/san_mrc
Related Projects/Codebase
- Pretrained UniLM: https://github.com/microsoft/unilm
- Pretrained Response Generation Model: https://github.com/microsoft/DialoGPT
- Internal MT-DNN repo: https://github.com/microsoft/mt-dnn
How do I cite MT-DNN?
@inproceedings{liu2019mt-dnn,
title = "Multi-Task Deep Neural Networks for Natural Language Understanding",
author = "Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng",
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P19-1441",
pages = "4487--4496"
}
@article{liu2019mt-dnn-kd,
title={Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding},
author={Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng},
journal={arXiv preprint arXiv:1904.09482},
year={2019}
}
@article{he2019hnn,
title={A Hybrid Neural Network Model for Commonsense Reasoning},
author={He, Pengcheng and Liu, Xiaodong and Chen, Weizhu and Gao, Jianfeng},
journal={arXiv preprint arXiv:1907.11983},
year={2019}
}
@article{liu2019radam,
title={On the Variance of the Adaptive Learning Rate and Beyond},
author={Liu, Liyuan and Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Han, Jiawei},
journal={arXiv preprint arXiv:1908.03265},
year={2019}
}
@article{jiang2019smart,
title={SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization},
author={Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Zhao, Tuo},
journal={arXiv preprint arXiv:1911.03437},
year={2019}
}
Contact Information
For help or issues using MT-DNN, please submit a GitHub issue.
For personal communication related to this package, please contact Xiaodong Liu (xiaodl@microsoft.com
), Yu Wang (yuwan@microsoft.com
), Pengcheng He (penhe@microsoft.com
), Weizhu Chen (wzchen@microsoft.com
), Jianshu Ji (jianshuj@microsoft.com
), Emmanuel Awa (Emmanuel.Awa@microsoft.com
) or Jianfeng Gao (jfgao@microsoft.com
).
Contributing
This project welcomes contributions and suggestions. For more details please check the complete steps to contributing to this repo here.