This commit is contained in:
Said Bleik 2019-07-03 15:46:11 -04:00
Родитель 5e2130238d
Коммит 88c724303b
4 изменённых файлов: 14 добавлений и 16 удалений

Просмотреть файл

@ -5,10 +5,7 @@
# NLP Best Practices
This repository contains examples and best practices for building NLP systems, provided as Jupyter notebooks and utility functions. The focus of the repository is on state-of-the-art methods and common scenarios that are popular among researchers and practitioners working on problems involving text and language.
## Planning
All feature planning is done via projects, milestones, and issues in this repository.
This repository contains examples and best practices for building NLP systems, provided as [Jupyter notebooks](scenarios) and [utility functions](utils_nlp). The focus of the repository is on state-of-the-art methods and common scenarios that are popular among researchers and practitioners working on problems involving text and language.
## Getting Started
To get started, navigate to the [Setup Guide](SETUP.md), where you'll find instructions on how to setup your environment and dependencies.

Просмотреть файл

@ -4,15 +4,15 @@ This folder contains examples and best practices, written in Jupyter notebooks,
## Summary
The following summarizes each scenario of the best practice notebooks. Each scenario is demonstrated in one or more Jupyter notebook examples that make use of the core code base of models and utilities.
The following is a summary of the scenarios covered in the best practice notebooks. Each scenario is demonstrated in one or more Jupyter notebook examples that make use of the core code base of models and utilities.
| Scenario | Applications | Languages | Models |
|---| ------------------------ | -------------------------------------------- | ------------------- |
|[Text Classification](scenarios/text_classification) |Topic Classification|en, zh, ar|BERT|
|[Named Entity Recognition](scenarios/named_entity_recognition) |Wikipedia NER | en, zh |BERT|
|[Question Answering](scenarios/question_answering) |SQuAD | en |BiDAF|
|[Sentence Similarity](scenarios/sentence_similarity) |STS Benchmark |en|Representation: TF-IDF, Word Embeddings, Doc Embeddings<br>Metrics: Cosine Similarity, Word Mover's Distance|
|[Embeddings](scenarios/embeddings)| Custom Embeddings Training|en|Word2Vec<br>fastText<br>GloVe|
| Scenario | Applications | Models |
|---| ------------------------ | ------------------- |
|[Text Classification](scenarios/text_classification) |Topic Classification|BERT|
|[Named Entity Recognition](scenarios/named_entity_recognition) |Wikipedia NER |BERT|
|[Question Answering](scenarios/question_answering) |SQuAD | BiDAF|
|[Sentence Similarity](scenarios/sentence_similarity) |STS Benchmark |Representation: TF-IDF, Word Embeddings, Doc Embeddings<br>Metrics: Cosine Similarity, Word Mover's Distance|
|[Embeddings](scenarios/embeddings)| Custom Embeddings Training|Word2Vec<br>fastText<br>GloVe|
## Azure-enhanced notebooks

Просмотреть файл

@ -5,4 +5,4 @@ names, locations, organizations, etc. The state-of-the art NER methods include
combining Long Short-Term Memory neural network with Conditional Random Field
(LSTM-CRF) and pretrained language models like BERT. NER can be used for
information extraction and filtering. It also plays an important role in other
NLP tasks like question answering and texts summarization.
NLP tasks like question answering and text summarization.

Просмотреть файл

@ -1,16 +1,17 @@
# Sentence Similarity
This folder contains examples and best practices, written in Jupyter notebooks, for building sentence similarity models. The scores can be used in a wide variety of applications, such as search/retrieval, nearest-neighbor or kernel-based classification methods, recommendation, and ranking tasks.
This folder contains examples and best practices, written in Jupyter notebooks, for building sentence similarity models. The scores can be used in a wide variety of applications, such as search/retrieval, nearest-neighbor or kernel-based classification methods, recommendations, and ranking tasks.
## What is sentence similarity
Sentence similarity or semantic textual similarity is to determine how similar two pieces of texts are and a measure of the degree to which two pieces of text express the same meaning. This can take the form of assigning a score from 1 to 5. Related tasks are paraphrase or duplicate identification. The common methods used for text similarity range from simple word-vector dot products to pairwise classification, and more recently, Siamese recurrent/convolutional neural networks with triplet loss functions.
Sentence similarity or semantic textual similarity is a measure of how similar two pieces of text are, or to what degree they express the same meaning. Related tasks include paraphrase or duplicate identification, search, and matching applications. The common methods used for text similarity range from simple word-vector dot products to pairwise classification, and more recently, deep neural networks.
Sentence similarity is normally calculated by the following two steps:
1. obtaining the embeddings of the sentences
2. taking the cosine similarity between them as shown in the following figure([Source](https://tfhub.dev/google/universal-sentence-encoder/1)):
2. taking the cosine similarity between them as shown in the following figure([source](https://tfhub.dev/google/universal-sentence-encoder/1)):
![Sentence Similarity](https://nlpbp.blob.core.windows.net/images/example-similarity.png)
## Summary