7.4 KiB
Presidio - Data Protection and Anonymization API
Context aware, pluggable and customizable PII anonymization service for text and images.
What is Presidio
Presidio (Origin from Latin praesidium ‘protection, garrison’) helps to ensure sensitive text is properly managed and governed. It provides fast analytics and anonymization for sensitive text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers and financial data. Presidio analyzes the text using predefined or custom recognizers to identify entities, patterns, formats, and checksums with relevant context. Presidio leverages docker and kubernetes for workloads at scale.
Presidio can be integrated into any data pipeline for intelligent PII scrubbing. It is open-source, transparent and scalable. Additionally, PII anonymization use-cases often require a different set of PII entities to be detected, some of which are domain or business specific. Presidio allows you to customize or add new PII recognizers via API or code to best fit your anonymization needs.
⚠️ Presidio can help identify sensitive/PII data in un/structured text. However, because Presidio is using trained ML models, there is no guarantee that Presidio will find all sensitive information. Consequently, additional systems and protections should be employed.
Demo
Try Presidio with your own data
Overview
Presidio API
API Spec - available APIs, request and response formats.
Presidio REST API Open API Spec
API Samples
- Simple Text Analysis
- Create Reusable Templates
- Detect Specific Entities
- Custom Anonymization
- Add Custom PII Entity Recognizer
- Image Anonymization
Learn more
More information can be found in Presidio Documentation
- Supported field types
- Database and storage scanner
- Adding new PII recognizers
- Generating Swagger file
- Evaluating Presidio
- Proto packages for Presidio API
Deploying Presidio on a Kubernetes Cluster
Follow the Deployment Guidelines for details:
- Single click deployment on a Kubernetes Cluster
- Step by Step Deployment with customizable parameters on a Kubernetes Cluster
Developing Presidio
- Setting Up a Development Environment
- Adding Custom Fields
- Recognizers Development - Best Practices and Considerations
- Using the Analyzer Service
- Calling the different services
- Connector Developer Guide
Deploy Presidio for Test and Dev
- Deploy locally using Docker
- Deploy locally using KIND
- Presidio-Analyzer as a standalone python package
Current input/output components status
Module | Feature | Status |
---|---|---|
API | HTTP input | ✅ |
Scanner | MySQL | ❌ |
Scanner | MSSQL | ❌ |
Scanner | PostgreSQL | ❌ |
Scanner | Oracle | ❌ |
Scanner | Azure Blob Storage | ✅ |
Scanner | S3 | ✅ |
Scanner | Google Cloud Storage | ❌ |
Streams | Kafka | ✅ |
Streams | Azure Event Hub | ✅ |
Datasink (output) | MySQL | ✅ |
Datasink (output) | MSSQL | ✅ |
Datasink (output) | Oracle | ❌ |
Datasink (output) | PostgreSQL | ✅ |
Datasink (output) | Kafka | ✅ |
Datasink (output) | Azure Event Hub | ✅ |
Datasink (output) | Azure Blob Storage | ✅ |
Datasink (output) | S3 | ✅ |
Datasink (output) | Google Cloud Storage | ❌ |
- ✅ - Working
- 🔶 - Partially supported (alpha)
- ❌ - Not supported yet
How to contact us?
If you have a usage question, found a bug or have a suggestion for improvement, please file a Github issue. For other matters, please email presidio@microsoft.com
❗ Note: As we are in the process of defining the roadmap for Presidio, we will only accept PRs with bug fixes and no new features in the upcoming months.
Contributing
For details on contributing to this repository, see the contributing guide.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.