Natural Language Processing Best Practices & Examples
Перейти к файлу
Said Bleik ed4e09b9b3 edits to loader test 2019-06-21 16:33:37 -04:00
.ci source activate 2019-06-11 13:56:57 -04:00
.github issue template 2019-05-14 12:21:40 +01:00
benchmarks Intial commit to put the receipe template in 2019-04-05 13:55:58 -04:00
docs Intial commit to put the receipe template in 2019-04-05 13:55:58 -04:00
scenarios move all the imports to global settings 2019-06-19 11:24:39 -04:00
tests edits to loader test 2019-06-21 16:33:37 -04:00
tools add dask dependency 2019-06-19 17:03:37 -04:00
utils_nlp add sequential loader 2019-06-19 16:12:20 -04:00
.amlignore Added AML Ignore 2019-06-17 16:57:08 -04:00
.flake8 Merging staging branch to master (#3) 2019-04-05 20:18:15 -04:00
.gitignore gitignore and conda file 2019-06-04 14:51:35 +01:00
.pre-commit-config.yaml Changed python version in pre-commit-config back to 3.6 2019-06-13 14:46:57 -04:00
AUTHORS.md Intial commit to put the receipe template in 2019-04-05 13:55:58 -04:00
CONTRIBUTING.md Merging staging branch to master (#3) 2019-04-05 20:18:15 -04:00
LICENSE Intial commit to put the receipe template in 2019-04-05 13:55:58 -04:00
README.md documentation additions 2019-06-14 16:29:32 -04:00
SETUP.md documentation additions 2019-06-14 16:29:32 -04:00
pyproject.toml Merging staging branch to master (#3) 2019-04-05 20:18:15 -04:00

README.md

Branch Status Branch Status
master Build Status staging Build Status

NLP Best Practices

This repository contains examples and best practices for building NLP systems, provided as Jupyter notebooks and utility functions. The focus of the repository is on state-of-the-art methods and common scenarios that are popular among researchers and practitioners working on problems involving text and language.

The following section includes a list of the available scenarios. Each scenario is demonstrated in one or more Jupyter notebook examples that make use of the core code base of models and utilities.

Scenarios

Scenario Applications Languages Models
Text Classification Topic Classification en, zh, ar BERT
Named Entity Recognition Wikipedia NER en, zh BERT
Sentence Similarity STS Benchmark en Representation: TF-IDF, Word Embeddings, Doc Embeddings
Metrics: Cosine Similarity, Word Mover's Distance
Embeddings Custom Embeddings Training en Word2Vec
fastText
GloVe

Planning

All feature planning is done via projects, milestones, and issues in this repository.

Getting Started

To get started, navigate to the Setup Guide, where you'll find instructions on how to setup your environment and dependencies.

Contributing

This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.