Граф коммитов

17 Коммитов

Автор SHA1 Сообщение Дата
Omri Mendels 56fb84d353
Add missing dependencies for image-redactor (#1257) 2024-01-22 09:24:25 +02:00
Omri Mendels 3a7b8f67f5
Integrating spacy-huggingface-pipelines and refactoring NlpEngine logic (#1159) 2023-10-19 13:58:04 +03:00
Gord Lueck 818c80f978
Msft document intelligence ocr (#1184) 2023-10-18 21:37:58 +03:00
ayabel 6cf5f18cb4
added image processing class to preprocess the image before running OCR (#1166) 2023-10-04 14:42:41 +03:00
Nile Wilson 67833d5be3
DICOM redactor improvement: Enabling compatibility with compressed images (#1105)
* Adding in methods for compression and modifying existing unit test for adding redact boxes

* Adding unit tests

* Linting fixes

* Fixing exception type

* Adding in python-gdcm dependency

* Adding python-gdcm to piplock

* Updating pipfile.lock

* Switching from gdcm license to python-gdcm license

* Adding in methods for compression and modifying existing unit test for adding redact boxes

* Adding unit tests

* Linting fixes

* Fixing exception type

* Adding in python-gdcm dependency

* Switching from gdcm license to python-gdcm license

* fix ammend

* comment out new tests

* revert

* Incorporating bug fix that was merged in

* Linting fix

* Temporarily commenting out one integration test to see effect on build pipeline hangup

* Adding _strip_score back in

* Commenting out entire integration test for DICOM Image PII verify engine

* Trying alternate assertion for test_eval_dicom_correctly

* Removing all assertions in test_eval_dicom_correctly

* Explicitly including argument names and adding some temporary print statements

* Using deepcopy of passed-in mock DICOM verify results

* Using deepcopy for the passed-in example instance as well

* Renaming any variables that may have overlap with existing variables in the method being test

* Testing effect of mocking verify_dicom_instance in eval_dicom_instance

* Mocking all calls except get_bboxes_from_ocr_results in test_eval_dicom_correctly

* Removing call to method and assertions

* Commenting out test_verify_correctly

* Adding test_verify_correctly back and reducing ambiguity as much as possible

* Removing ambiguity even more

* Changing _strip_score back to not returning but keeping input arg name change

* Commenting out assertions for test_verify_correctly

* Comment out test_verify_correctly assertions and full test_eval_dicom_correctly test

* Adding assertions back into test_verify_correctly

* Commenting out last assertion in test_verify_correctly

* Reformatting to avoid use of all in assert

* Using frozen set comparison for final assert in test_verify_correctly

* Removing final assertion from test_verify_correctly while keeping test_eval_dicom_correctly in

* Commenting out act and assert in test_verify_correctly while keeping test_eval_dicom_correctly in

* Using module-level mock variables and moving act call to less lines in test_verify_correctly

* Removing all assertions but keeping other sections for both tests

* Adding back in first assertions for each test

* Removing all assertions except fist image type assertion in verify test

* Removing all assertions except image type assertion in test_eval_dicom_correctly

* Remove image type assertions but keep all other assertions in

* Remove image type assertions and all assertions for eval test

* Only keep final assertion in verify test

* Only keep the second assertion in verify and nothing else

* Only keep final assertion in eval test

* Enable all assertions and move helper methods above test methods

* Simplifying assertions

* Removing unused code now that we have simplified assertions

* Reverting back but simplifying asserts another way

* Removing eval integration test since unit test for that covers functionality

* Remvoing analyzer results assertion

* Removing unused import

---------

Co-authored-by: Sharon Hart <sharonh.dev@gmail.com>
2023-07-13 16:00:45 -04:00
Nile Wilson 2730c6c00b
Adding DICOM image redacting capability to presidio-image-redactor module (#960)
* Adding DicomImageRedactorEngine class without test data for now

* Updating and putting placeholder in package doc

* Updating docs

* Moving presidio-dicom-image-redactor module into presidio-image-redactor module as an extension

* Updating docs to reflect refactor from presidio-dicom-image-redactor module to extension of presidio-image-redactor module

* Docs fixes continued

* Reflecting move from new module to updating existing module

* Updating requirements

* Linting fixes

* Linting fixes. Function and variable name casing

* Fixing getting started code blocks

* Updating image redactor version

* Fix minor typo: phi -> PHI in comment

* Update presidio-image-redactor/presidio_image_redactor/utils/dicom_image_redact_utils.py

Co-authored-by: Omri Mendels <omri374@users.noreply.github.com>

* Update presidio-image-redactor/presidio_image_redactor/utils/dicom_image_redact_utils.py

Co-authored-by: Omri Mendels <omri374@users.noreply.github.com>

* Fixing new typing adding added from PR comment commit

* Moving all utils into the DicomImageRedactorEngine class

* Fixing integration test now that utils have been moved

* Moving to ad-hoc recognizer approach

* Updating redact to have similar format as in ImageRedactorEngine. Moved existing logic for redact into redact_from_file

* Updating requirements

* Updating docs with new getting started code

* Versioning and changelog updates

* Adding in data with citations

* Linting fixes

* Minor change: from multiple list append to list extend

* Adding pytest-mock to requirements

* Fixing paths to test data

* Making file extension search more flexible and work with both Windows and Unix systems

* Changing docstring style to match rest of repo

* Making statement about Tesseract version more general in one other doc (was caught in another earlier)

* Changing box_color_setting to fill for consistency

* Updating changelog based on PR comments

* Update docs/image-redactor/index.md

Co-authored-by: Omri Mendels <omri374@users.noreply.github.com>

* Updating docs

* Adding **kwargs support and addressing minor comments

* Adjusting with clause duration

* Adding note about custom ImageAnalyzerEngine support

* Changing minor comment in docstring

* Splitting redact_from_file into redact_from_file and redact_from_directory

Co-authored-by: Omri Mendels <omri374@users.noreply.github.com>
2022-12-13 08:47:30 -05:00
Sharon Hart d9b4de78b0
Multiregional Phone Number Recognizer (#722)
* Multiregional Phone Number Recognizer

* lock, add notice, add to setup

* black with python3

* refactor + docstring

* Add exp.

* Add default regions

* Move international to default

* Make recognizer nondefault

* Rebase main

* Move to class fields

* lint

* non fields

* space

* Add SUPPORTED_REGIONS check

* check international

* add international number support flag

Co-authored-by: sharon <sharon.hart@microsoft.com>
Co-authored-by: Omri Mendels <omri374@users.noreply.github.com>
2021-06-13 12:10:13 +03:00
Sharon Hart 05b568681d
Bump matplotlib in image-redactor (#689)
* Test matplotlib

* bump in setup

* Add to notice

Co-authored-by: sharon <sharon.hart@microsoft.com>
2021-05-05 09:02:48 +03:00
Omri Mendels 22016f4ebd
Added image-redactor 3rd party OSS to NOTICE (#623) 2021-03-21 14:15:13 +02:00
Sharon Hart 5ba46c2bea
Anonymizer - Add Encryption Logics (#595)
* Add AES Encryption Logics

* Update NOTICE

* Key as method argument

Co-authored-by: sharon <sharon.hart@microsoft.com>
2021-03-08 11:00:10 +02:00
Sharon Hart 5bc5b2977b
Add Flasgger + helathcheck for all services (#521)
* Add flasgger to all services

* Add to NOTICE

* Lock verison

* Change health payload + welcome message

Co-authored-by: sharon <sharon.hart@microsoft.com>
2021-02-16 16:12:02 +02:00
omri374 2d241407a2 added PyYaml to Pipfile and Pipfile.lock 2021-01-28 12:30:30 +02:00
Itye cb12bf221f notice with flask 2021-01-20 11:53:50 +02:00
Omri Mendels f46778b134
Omri374/analyzer v2 (#392)
* Removed all previous dependencies from NOTICE

* 1. Removed pyre2

2. Removed the GRPC app.py and pb2 files
3. Added type hints
4. Removed recognizer store API calls
5. Added batch calls to the analyzer engine (simple implementation)
6. Used black to format all files
7. SpacyNlpEngine now returns spacy tokens and not just the text tokens
8. Fixed bug in us SSN recognizer
9. Remove duplicates is now per recognizer (if the recognizer identifies the same text as entity by multiple different logics, e.g. regex patterns then only the one with the highest confidence is kept.
9. Fixed tests to support 1-9
10. Demo test now compares actual anonymized text with expected anonymized text (more representative of the real demo website)

* updated setup.py

* fixed logging across, new README

* minor fixes

* Set up CI with Azure Pipelines

* removed analyzer_batch and added analyzer.get_supported_entities

* Update azure-pipelines.yml for Azure Pipelines

* Removed all previous dependencies from NOTICE

* 1. Removed pyre2

2. Removed the GRPC app.py and pb2 files
3. Added type hints
4. Removed recognizer store API calls
5. Added batch calls to the analyzer engine (simple implementation)
6. Used black to format all files
7. SpacyNlpEngine now returns spacy tokens and not just the text tokens
8. Fixed bug in us SSN recognizer
9. Remove duplicates is now per recognizer (if the recognizer identifies the same text as entity by multiple different logics, e.g. regex patterns then only the one with the highest confidence is kept.
9. Fixed tests to support 1-9
10. Demo test now compares actual anonymized text with expected anonymized text (more representative of the real demo website)

* updated setup.py

* fixed logging across, new README

* minor fixes

* removed analyzer_batch and added analyzer.get_supported_entities

* removed setup.cfg for now

* readding setup.cfg with flake8 max-line-width

* removed duplicate word

* Added internal link to customization
2021-01-13 12:38:30 +02:00
Tomer Rosenthal 8b642e20a2
Version 0.1.1 (#82)
* Docs (#80)

* Fixed Anonymizer duplicates bug (#81)
2018-12-19 21:37:43 +02:00
Tomer Rosenthal b7aaab002d
Remove kinesis support (#74)
* Initial presidio version

* Fixed CPU bug

* Removed kinesis
2018-11-10 08:46:49 +02:00
torosent ea4c45bd1e Initial presidio version 2018-10-10 23:51:43 +03:00