Natural Language Processing Best Practices & Examples
Перейти к файлу
Yijing Chen bd2b60d0bc update with the changes in staging branch 2019-08-13 18:10:56 +00:00
.github issue template 2019-05-14 12:21:40 +01:00
docs Intial commit to put the receipe template in 2019-04-05 13:55:58 -04:00
scenarios update with the changes in staging branch 2019-08-13 18:10:56 +00:00
tests update with the changes in staging branch 2019-08-13 18:10:56 +00:00
tools Merge pull request #223 from microsoft/daden/er 2019-08-08 11:23:51 -05:00
utils_nlp update with the changes in staging branch 2019-08-13 18:10:56 +00:00
.amlignore Added AML Ignore 2019-06-17 16:57:08 -04:00
.bumpversion.cfg feat: Configure NLP Utils Semantic Versioning with setuptools_scm and bumpversion 2019-08-09 14:56:28 +00:00
.flake8 change line length 2019-06-21 15:22:15 -04:00
.gitignore gitignore 💥 2019-08-07 14:58:01 +00:00
.pre-commit-config.yaml Changed python version in pre-commit-config back to 3.6 2019-06-13 14:46:57 -04:00
CONTRIBUTING.md Rijai reposetup (#1) 2019-04-05 19:01:56 -04:00
LICENSE Intial commit to put the receipe template in 2019-04-05 13:55:58 -04:00
MANIFEST.in Make utils pip installable using setup.py 2019-07-24 14:22:22 -04:00
NOTICE.txt update with the changes in staging branch 2019-08-13 18:10:56 +00:00
README.md Merge pull request #266 from kehuangms/origin/kehuan 2019-08-12 13:27:14 -04:00
SETUP.md A few minor doc fixes. 2019-07-31 17:16:50 -04:00
pyproject.toml There was a proposal to update this to 100 2019-07-30 14:35:19 -04:00
setup.py feat: Configure NLP Utils Semantic Versioning with setuptools_scm and bumpversion 2019-08-09 14:56:28 +00:00

README.md

NLP Best Practices

This repository contains examples and best practices for building natural language processing (NLP) systems, provided as Jupyter notebooks and utility functions. The focus of the repository is on state-of-the-art methods and common scenarios that are popular among researchers and practitioners working on problems involving text and language.

Overview

The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in NLP algorithms, neural architectures, and distributed machine learning systems. The content is based on our past and potential future engagements with customers as well as collaboration with partners, researchers, and the open source community.

Were hoping that the tools would significantly reduce the time from a business problem, or a research idea, to full implementation of a system. In addition, the example notebooks would serve as guidelines and showcase best practices and usage of the tools.

In an era of transfer learning, transformers, and deep architectures, we believe that pretrained models provide a unified solution to many real-world problems and allow handling different tasks and languages easily. We will, therefore, prioritize such models, as they achieve state-of-the-art results on several NLP benchmarks and can be used in a number of applications ranging from simple text classification to sophisticated intelligent chat bots.

GLUE Leaderboard
SQuAD Leaderbord

Content

The following is a summary of the scenarios covered in the repository. Each scenario is demonstrated in one or more Jupyter notebook examples that make use of the core code base of models and utilities.

Scenario Applications Models
Text Classification Topic Classification BERT
Named Entity Recognition Wikipedia NER BERT
Entailment MultiNLI Natural Language Inference BERT
Question Answering SQuAD BiDAF, BERT
Sentence Similarity STS Benchmark Representation: TF-IDF, Word Embeddings, Doc Embeddings
Metrics: Cosine Similarity, Word Mover's Distance
Embeddings Custom Embeddings Training Word2Vec
fastText
GloVe
Annotation Text annotation Tutorial

Getting Started

To get started, navigate to the Setup Guide, where you'll find instructions on how to setup your environment and dependencies.

Contributing

This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.

Build Status

Build Type Branch Status Branch Status
Linux CPU master Build Status staging Build Status
Linux GPU master Build Status staging Build Status