Natural Language Processing Best Practices & Examples

azure-ml best-practices deep-learning machine-learning mlflow natural-language natural-language-inference natural-language-processing natural-language-understanding nli nlp nlu pretrained-models sota text text-classification transfomer

Перейти к файлу

Yijing Chen bd2b60d0bc update with the changes in staging branch		2019-08-13 18:10:56 +00:00
.github	issue template	2019-05-14 12:21:40 +01:00
docs	Intial commit to put the receipe template in	2019-04-05 13:55:58 -04:00
scenarios	update with the changes in staging branch	2019-08-13 18:10:56 +00:00
tests	update with the changes in staging branch	2019-08-13 18:10:56 +00:00
tools	Merge pull request #223 from microsoft/daden/er	2019-08-08 11:23:51 -05:00
utils_nlp	update with the changes in staging branch	2019-08-13 18:10:56 +00:00
.amlignore	Added AML Ignore	2019-06-17 16:57:08 -04:00
.bumpversion.cfg	feat: Configure NLP Utils Semantic Versioning with setuptools_scm and bumpversion	2019-08-09 14:56:28 +00:00
.flake8	change line length	2019-06-21 15:22:15 -04:00
.gitignore	gitignore 💥	2019-08-07 14:58:01 +00:00
.pre-commit-config.yaml	Changed python version in pre-commit-config back to 3.6	2019-06-13 14:46:57 -04:00
CONTRIBUTING.md	Rijai reposetup (#1 )	2019-04-05 19:01:56 -04:00
LICENSE	Intial commit to put the receipe template in	2019-04-05 13:55:58 -04:00
MANIFEST.in	Make utils pip installable using setup.py	2019-07-24 14:22:22 -04:00
NOTICE.txt	update with the changes in staging branch	2019-08-13 18:10:56 +00:00
README.md	Merge pull request #266 from kehuangms/origin/kehuan	2019-08-12 13:27:14 -04:00
SETUP.md	A few minor doc fixes.	2019-07-31 17:16:50 -04:00
pyproject.toml	There was a proposal to update this to 100	2019-07-30 14:35:19 -04:00
setup.py	feat: Configure NLP Utils Semantic Versioning with setuptools_scm and bumpversion	2019-08-09 14:56:28 +00:00

README.md

NLP Best Practices

This repository contains examples and best practices for building natural language processing (NLP) systems, provided as Jupyter notebooks and utility functions. The focus of the repository is on state-of-the-art methods and common scenarios that are popular among researchers and practitioners working on problems involving text and language.

Overview

The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in NLP algorithms, neural architectures, and distributed machine learning systems. The content is based on our past and potential future engagements with customers as well as collaboration with partners, researchers, and the open source community.

We’re hoping that the tools would significantly reduce the time from a business problem, or a research idea, to full implementation of a system. In addition, the example notebooks would serve as guidelines and showcase best practices and usage of the tools.

In an era of transfer learning, transformers, and deep architectures, we believe that pretrained models provide a unified solution to many real-world problems and allow handling different tasks and languages easily. We will, therefore, prioritize such models, as they achieve state-of-the-art results on several NLP benchmarks and can be used in a number of applications ranging from simple text classification to sophisticated intelligent chat bots.

GLUE Leaderboard
SQuAD Leaderbord

Content

The following is a summary of the scenarios covered in the repository. Each scenario is demonstrated in one or more Jupyter notebook examples that make use of the core code base of models and utilities.

Scenario	Applications	Models
Text Classification	Topic Classification	BERT
Named Entity Recognition	Wikipedia NER	BERT
Entailment	MultiNLI Natural Language Inference	BERT
Question Answering	SQuAD	BiDAF, BERT
Sentence Similarity	STS Benchmark	Representation: TF-IDF, Word Embeddings, Doc Embeddings Metrics: Cosine Similarity, Word Mover's Distance
Embeddings	Custom Embeddings Training	Word2Vec fastText GloVe
Annotation	Text annotation	Tutorial

Getting Started

To get started, navigate to the Setup Guide, where you'll find instructions on how to setup your environment and dependencies.

Contributing

This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.

Build Status

Build Type	Branch	Status		Branch	Status
Linux CPU	master			staging
Linux GPU	master			staging

README.md Убрать экранирование Экранировать