* Adding in methods for compression and modifying existing unit test for adding redact boxes
* Adding unit tests
* Linting fixes
* Fixing exception type
* Adding in python-gdcm dependency
* Adding python-gdcm to piplock
* Updating pipfile.lock
* Switching from gdcm license to python-gdcm license
* Adding in methods for compression and modifying existing unit test for adding redact boxes
* Adding unit tests
* Linting fixes
* Fixing exception type
* Adding in python-gdcm dependency
* Switching from gdcm license to python-gdcm license
* fix ammend
* comment out new tests
* revert
* Incorporating bug fix that was merged in
* Linting fix
* Temporarily commenting out one integration test to see effect on build pipeline hangup
* Adding _strip_score back in
* Commenting out entire integration test for DICOM Image PII verify engine
* Trying alternate assertion for test_eval_dicom_correctly
* Removing all assertions in test_eval_dicom_correctly
* Explicitly including argument names and adding some temporary print statements
* Using deepcopy of passed-in mock DICOM verify results
* Using deepcopy for the passed-in example instance as well
* Renaming any variables that may have overlap with existing variables in the method being test
* Testing effect of mocking verify_dicom_instance in eval_dicom_instance
* Mocking all calls except get_bboxes_from_ocr_results in test_eval_dicom_correctly
* Removing call to method and assertions
* Commenting out test_verify_correctly
* Adding test_verify_correctly back and reducing ambiguity as much as possible
* Removing ambiguity even more
* Changing _strip_score back to not returning but keeping input arg name change
* Commenting out assertions for test_verify_correctly
* Comment out test_verify_correctly assertions and full test_eval_dicom_correctly test
* Adding assertions back into test_verify_correctly
* Commenting out last assertion in test_verify_correctly
* Reformatting to avoid use of all in assert
* Using frozen set comparison for final assert in test_verify_correctly
* Removing final assertion from test_verify_correctly while keeping test_eval_dicom_correctly in
* Commenting out act and assert in test_verify_correctly while keeping test_eval_dicom_correctly in
* Using module-level mock variables and moving act call to less lines in test_verify_correctly
* Removing all assertions but keeping other sections for both tests
* Adding back in first assertions for each test
* Removing all assertions except fist image type assertion in verify test
* Removing all assertions except image type assertion in test_eval_dicom_correctly
* Remove image type assertions but keep all other assertions in
* Remove image type assertions and all assertions for eval test
* Only keep final assertion in verify test
* Only keep the second assertion in verify and nothing else
* Only keep final assertion in eval test
* Enable all assertions and move helper methods above test methods
* Simplifying assertions
* Removing unused code now that we have simplified assertions
* Reverting back but simplifying asserts another way
* Removing eval integration test since unit test for that covers functionality
* Remvoing analyzer results assertion
* Removing unused import
---------
Co-authored-by: Sharon Hart <sharonh.dev@gmail.com>
* Adding DicomImageRedactorEngine class without test data for now
* Updating and putting placeholder in package doc
* Updating docs
* Moving presidio-dicom-image-redactor module into presidio-image-redactor module as an extension
* Updating docs to reflect refactor from presidio-dicom-image-redactor module to extension of presidio-image-redactor module
* Docs fixes continued
* Reflecting move from new module to updating existing module
* Updating requirements
* Linting fixes
* Linting fixes. Function and variable name casing
* Fixing getting started code blocks
* Updating image redactor version
* Fix minor typo: phi -> PHI in comment
* Update presidio-image-redactor/presidio_image_redactor/utils/dicom_image_redact_utils.py
Co-authored-by: Omri Mendels <omri374@users.noreply.github.com>
* Update presidio-image-redactor/presidio_image_redactor/utils/dicom_image_redact_utils.py
Co-authored-by: Omri Mendels <omri374@users.noreply.github.com>
* Fixing new typing adding added from PR comment commit
* Moving all utils into the DicomImageRedactorEngine class
* Fixing integration test now that utils have been moved
* Moving to ad-hoc recognizer approach
* Updating redact to have similar format as in ImageRedactorEngine. Moved existing logic for redact into redact_from_file
* Updating requirements
* Updating docs with new getting started code
* Versioning and changelog updates
* Adding in data with citations
* Linting fixes
* Minor change: from multiple list append to list extend
* Adding pytest-mock to requirements
* Fixing paths to test data
* Making file extension search more flexible and work with both Windows and Unix systems
* Changing docstring style to match rest of repo
* Making statement about Tesseract version more general in one other doc (was caught in another earlier)
* Changing box_color_setting to fill for consistency
* Updating changelog based on PR comments
* Update docs/image-redactor/index.md
Co-authored-by: Omri Mendels <omri374@users.noreply.github.com>
* Updating docs
* Adding **kwargs support and addressing minor comments
* Adjusting with clause duration
* Adding note about custom ImageAnalyzerEngine support
* Changing minor comment in docstring
* Splitting redact_from_file into redact_from_file and redact_from_directory
Co-authored-by: Omri Mendels <omri374@users.noreply.github.com>
* Multiregional Phone Number Recognizer
* lock, add notice, add to setup
* black with python3
* refactor + docstring
* Add exp.
* Add default regions
* Move international to default
* Make recognizer nondefault
* Rebase main
* Move to class fields
* lint
* non fields
* space
* Add SUPPORTED_REGIONS check
* check international
* add international number support flag
Co-authored-by: sharon <sharon.hart@microsoft.com>
Co-authored-by: Omri Mendels <omri374@users.noreply.github.com>
* Removed all previous dependencies from NOTICE
* 1. Removed pyre2
2. Removed the GRPC app.py and pb2 files
3. Added type hints
4. Removed recognizer store API calls
5. Added batch calls to the analyzer engine (simple implementation)
6. Used black to format all files
7. SpacyNlpEngine now returns spacy tokens and not just the text tokens
8. Fixed bug in us SSN recognizer
9. Remove duplicates is now per recognizer (if the recognizer identifies the same text as entity by multiple different logics, e.g. regex patterns then only the one with the highest confidence is kept.
9. Fixed tests to support 1-9
10. Demo test now compares actual anonymized text with expected anonymized text (more representative of the real demo website)
* updated setup.py
* fixed logging across, new README
* minor fixes
* Set up CI with Azure Pipelines
* removed analyzer_batch and added analyzer.get_supported_entities
* Update azure-pipelines.yml for Azure Pipelines
* Removed all previous dependencies from NOTICE
* 1. Removed pyre2
2. Removed the GRPC app.py and pb2 files
3. Added type hints
4. Removed recognizer store API calls
5. Added batch calls to the analyzer engine (simple implementation)
6. Used black to format all files
7. SpacyNlpEngine now returns spacy tokens and not just the text tokens
8. Fixed bug in us SSN recognizer
9. Remove duplicates is now per recognizer (if the recognizer identifies the same text as entity by multiple different logics, e.g. regex patterns then only the one with the highest confidence is kept.
9. Fixed tests to support 1-9
10. Demo test now compares actual anonymized text with expected anonymized text (more representative of the real demo website)
* updated setup.py
* fixed logging across, new README
* minor fixes
* removed analyzer_batch and added analyzer.get_supported_entities
* removed setup.cfg for now
* readding setup.cfg with flake8 max-line-width
* removed duplicate word
* Added internal link to customization