60 строки
6.3 KiB
Markdown
60 строки
6.3 KiB
Markdown
# Microsoft Research NLP Projects
|
|
|
|
This is a list of open-sourced projects [Microsoft Research NLP Group](https://www.microsoft.com/en-us/research/group/natural-language-processing) involved. (ranked in time order)
|
|
|
|
## Datasets
|
|
| Title | Description | Related projects |
|
|
| :------------- | :----------- | :----------- |
|
|
| [Dialogue Feedback Dataset](https://github.com/golsun/DialogRPT) | 100+ Millions of dialogues with corresponding human feedback to learn which one gets better feedback | [DialogRPT](https://github.com/golsun/DialogRPT) |
|
|
| [Grounded Dialogue Dataset](https://github.com/mgalley/DSTC7-End-to-End-Conversation-Modeling) | Dialogues with information grounded in external knowledge, e.g. wikipedia pages | [DSTC7](https://github.com/mgalley/DSTC7-End-to-End-Conversation-Modeling), [CMR](https://github.com/qkaren/converse_reading_cmr) |
|
|
| [Reddit Dialogue Dataset](https://github.com/microsoft/DialoGPT) | 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017 | [DialoGPT](https://github.com/microsoft/DialoGPT) |
|
|
|
|
## Papers
|
|
| Title | Links | Notes | Tags |
|
|
| :------------- | :-----------: | :-----------: |:-----------: |
|
|
| [Dialogue Response Ranking Training with Large-Scale Human Feedback Data](https://arxiv.org/abs/2009.06978) | [code/model/data](https://github.com/golsun/DialogRPT), [demo](https://colab.research.google.com/drive/1jQXzTYsgdZIQjJKrX4g3CP0_PGCeVU3C?usp=sharing) | EMNLP 2020 | `dialog` `ranking`|
|
|
| [POINTER: Constrained Text Generation via Insertion-based Generative Pre-training](https://arxiv.org/abs/2005.00558) | [code](https://github.com/dreasysnail/POINTER), [demo](http://52.247.25.3:8900/) | EMNLP 2020 | `generation` |
|
|
| [Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space](https://arxiv.org/abs/2004.04092) | [code](https://github.com/ChunyuanLI/Optimus), [demo](http://40.71.23.172:8899/) | EMNLP 2020 | `generation` |
|
|
| [RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers](https://arxiv.org/abs/1911.04942) | [code](https://github.com/microsoft/rat-sql) | ACL 2020 | `parsing`, `sql`|
|
|
| [A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks](https://arxiv.org/pdf/2005.09606.pdf) | [code](https://github.com/microsoft/multimodal-aligned-recipe-corpus) | ACL 2020 | `multimodal` |
|
|
| [INSET: Sentence Infilling with INter-SEntential Transformer](https://arxiv.org/abs/1911.03892) | [code/demo](https://github.com/dreasysnail/INSET) | ACL 2020 | `generation`|
|
|
| [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) | [code/model/data](https://github.com/microsoft/DialoGPT) | ACL 2020 | `dialog` `generation`|
|
|
| [MixingBoard: a Knowledgeable Stylized Integrated Text Generation Platform](https://arxiv.org/abs/2005.08365) | [code](https://github.com/microsoft/MixingBoard) | ACL 2020 | `dialog` `generation` `framework` `knowledge` `style` |
|
|
| [Vision-based Navigation with Language-based Assistance via Imitation Learning with Indirect Intervention](https://arxiv.org/abs/1812.04155)| [code/data](https://github.com/debadeepta/vnla) | CVPR 2019 | `navigation` `imitation learning` |
|
|
| [Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading](https://www.aclweb.org/anthology/P19-1539/) | [code/model/data](https://github.com/qkaren/converse_reading_cmr) | ACL 2019 | `knowledge` `dialog` `generation` |
|
|
| [Microsoft Icecaps: An Open-Source Toolkit for Conversation Modeling](https://www.aclweb.org/anthology/P19-3021.pdf) | [code](https://github.com/microsoft/icecaps) | ACL 2019 | `dialog` `generation` `framework` |
|
|
| [Structuring Latent Spaces for Stylized Response Generation](https://arxiv.org/abs/1909.05361) | [code/data](https://github.com/golsun/StyleFusion) | EMNLP 2019 | `style` `dialog` `generation` |
|
|
| [Jointly Optimizing Diversity and Relevance in Neural Response Generation](https://arxiv.org/abs/1902.11205) | [code/data](https://github.com/golsun/SpaceFusion) | NAACL 2019 | `dialog` `generation` |
|
|
| [Towards Content Transfer through Grounded Text Generation](https://arxiv.org/abs/1905.05293) | [code/data](https://github.com/shrimai/Towards-Content-Transfer-through-Grounded-Text-Generation) | NAACL 2019 | `generation` `knowledge`|
|
|
|
|
# Contributing
|
|
|
|
This project welcomes contributions and suggestions. Most contributions require you to agree to a
|
|
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
|
|
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
|
|
|
|
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
|
|
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
|
|
provided by the bot. You will only need to do this once across all repos using our CLA.
|
|
|
|
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
|
|
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
|
|
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
|
|
|
|
# Legal Notices
|
|
|
|
Microsoft and any contributors grant you a license to the Microsoft documentation and other content
|
|
in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode),
|
|
see the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the
|
|
[LICENSE-CODE](LICENSE-CODE) file.
|
|
|
|
Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation
|
|
may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries.
|
|
The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks.
|
|
Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.
|
|
|
|
Privacy information can be found at https://privacy.microsoft.com/en-us/
|
|
|
|
Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents,
|
|
or trademarks, whether by implication, estoppel or otherwise.
|