dfb8553c5b | ||
---|---|---|
.ci | ||
.github | ||
benchmarks | ||
docs | ||
scenarios | ||
tests | ||
tools | ||
utils_nlp | ||
.flake8 | ||
.gitignore | ||
.pre-commit-config.yaml | ||
AUTHORS.md | ||
CONTRIBUTING.md | ||
LICENSE | ||
README.md | ||
SETUP.md | ||
pyproject.toml |
README.md
Branch | Status | Branch | Status | |
---|---|---|---|---|
master | staging |
NLP Best Practices
This repository contains examples and best practices for building NLP systems, provided as Jupyter notebooks and utility functions. The focus of the repository is on state-of-the-art methods and common scenarios that are popular among researchers and practitioners working on problems involving text and language.
The following section includes a list of the available scenarios. Each scenario is demonstrated in one or more Jupyter notebook examples that make use of the core code base of models and utilities.
Scenarios
Scenario | Applications | Languages | Models |
---|---|---|---|
Text Classification | Topic Classification | en, zh, ar | BERT |
Named Entity Recognition | Wikipedia NER | en, zh | BERT |
Sentence Similarity | STS Benchmark | en | Representation: TF-IDF, Word Embeddings, Doc Embeddings Metrics: Cosine Similarity, Word Mover's Distance |
Embeddings | Custom Embeddings Training | en | Word2Vec fastText GloVe |
Planning
All feature planning is done via projects, milestones, and issues in this repository.
Getting Started
To get started, navigate to the Setup Guide, where you'll find instructions on how to setup your environment and dependencies.
Contributing
This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.