This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
Обновлено 2024-10-27 15:51:02 +03:00
This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.
Обновлено 2024-07-31 00:01:52 +03:00
Azure Search Cognitive Skill to extract technical and business skills from text
Обновлено 2024-04-25 08:03:40 +03:00
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
Обновлено 2023-07-25 17:21:55 +03:00
Multi-Task Deep Neural Networks for Natural Language Understanding
Обновлено 2023-06-13 00:28:35 +03:00
Cookiecutter API for creating Custom Skills for Azure Search using Python and Docker
Обновлено 2022-11-28 22:10:04 +03:00
This is a list of open-source projects at Microsoft Research NLP Group
Обновлено 2020-09-30 01:11:02 +03:00
This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The text that includes words from two languages such as Hindi written in roman script, mixed with English.
Обновлено 2020-08-12 02:05:32 +03:00
Unsupervised factor-based text tokenizer for natural-language processing applications
Обновлено 2020-07-24 22:30:59 +03:00