Context aware, pluggable and customizable data protection and de-identification SDK for text and images
hacktoberfest
microsoft
python
privacy
transformers
anonymization
anonymization-service
data-anonymization
data-loss-prevention
data-masking
data-protection
de-identification
dlp
pii
pii-anonymization
pii-anonymization-service
presidio
privacy-protection
text-anonymization
Обновлено 2024-11-06 16:21:45 +03:00
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
machine-learning
deep-learning
nlp
natural-language-processing
privacy
ner
transformers
pii
named-entity-recognition
spacy
flair
Обновлено 2024-10-27 15:51:02 +03:00
A fast algorithm to optimally compose privacy guarantees of differentially private (DP) mechanisms to arbitrary accuracy.
Обновлено 2024-02-15 16:24:04 +03:00
Federated Learning Utilities and Tools for Experimentation
machine-learning
pytorch
simulation
gloo
nccl
personalization
privacy-tools
transformers-models
distributed-learning
federated-learning
Обновлено 2024-01-11 22:20:09 +03:00
Toolkit for building machine learning models that generalize to unseen domains and are robust to privacy and other attacks.
machine-learning
artificial-intelligence
causality
domain-generalization
privacy-preserving-machine-learning
Обновлено 2023-10-03 07:31:52 +03:00
A python module implementing the ElectionGuard specification. This implementation can be used to conduct End-to-End Verifiable Elections as well as privacy-enhanced risk-limiting audits.
Обновлено 2023-08-02 03:24:27 +03:00
Truly Conversational Search is the next logic step in the journey to generate intelligent and useful AI. To understand what this may mean, researchers have voiced a continuous desire to study how people currently converse with search engines. Traditionally, the desire to produce such a comprehensive dataset has been limited because those who have this data (Search Engines) have a responsibility to their users to maintain their privacy and cannot share the data publicly in a way that upholds the trusts users have in the Search Engines. Given these two powerful forces we believe we have a dataset and paradigm that meets both sets of needs: A artificial public dataset that approximates the true data and an ability to evaluate model performance on the real user behavior. What this means is we released a public dataset which is generated by creating artificial sessions using embedding similarity and will test on the original data. To say this again: we are not releasing any private user data but are releasing what we believe to be a good representation of true user interactions.
Обновлено 2023-06-12 21:21:58 +03:00