* IN_PAN pattern recognizer
Added India PAN (Permanent Account Number) recognizer
* refined IN_PAN regex
refined the regex for better recognition and enhanced the test cases accordingly
* Update recognizer_registry.py
Fixed lint error that was missed earlier.
* Fixed Lint errors
Added test cases , verification and context data
* Added more test cases in test_in_pan_recognizer.py
Added negative test cases per review comments.
* added IN_AADHAAR recognizer
* Update in_aadhaar_recognizer.py
linted code
* Update in_aadhaar_recognizer.py
update pattern recognizer value per suggestion in review
* added utility function class
added PresidioAnalyzerUtils class with generic functions. removed usage of stdnum
* Create test_analyzer_utils.py
added test cases for analyzer_utils.py in prescribed format
* Update test_recognizer_registry.py
added to the count of predefined recognizers
* added predefined recognizer : IN_VEHICLE_REGISTRATION
Added India specific predefined pattern recognizer for vehicle registration number
* review comments incorporated
reinstated python 3.9 compatibility, reorganized code
* review comments incorporated
Logic reverted from analyzer_utils to recognizer classfile
* added null/min vehicle number size
added min size check to avoid failures per review comment
* incorporated review comments
---------
Co-authored-by: Omri Mendels <omri374@users.noreply.github.com>
* presidio-structured
changelog
Static analysis
docstrings, types
preliminary tests engine
static analysis
isort
Minor refactorings
Update README.md
Fix late binding issues and example
removal of old samples
Refactoring, adding example
pre-clean-break-commit
broken commit, fixing TabularConfigBuilder
Rename TabularConfig
pre-breaking replace commit
removal of some old experimental files
rename tabular to structured
restructuring presidio tabular - pre del commit
Add project TODOs
testing dump presidio tabular
* Add unit tests
* rename engine, add buildfile
* Update setup.py
* lint-build-test
* Update lint-build-test.yml
* Add packages to setup.py
* Update presidio-structured to alpha version
* Update Presidio structured README.md
* Add logging configuration to presidio-structured
module
* Refactor AnalysisBuilder constructor to accept an
optional AnalyzerEngine parameter
* Fix entity mapping in JsonAnalysisBuilder
* Drop type in docstring in analysis builder classes
* Refactor TabularAnalysisBuilder to use
BatchAnalyzerEngine for all columns
* Update data_reader.py with type hints for file
paths
* Update data_reader.py to include additional
keyword arguments in read() method
* Update Transformer to Processor term in
StructuredEngine
* Add PandasDataProcessor as default to StructuredEngine
init
* Move structured sample files to the docs
* Add Presidio Structured Notebook to samples index
* Remove unnecessary imports in structured sample
* Update to processors in structured __init__ files
* Add explanation for structured table sample
* Delete unnecessary __init__s in structured test
* Fix bug in JsonAnalysisBuilder entity mapping
* pr comments, nits, minor tests
* README
* Add TabularAnalysisBuilder
* Some basic logging
* linting
* Fix typo in logger variable name
* Refactor analysis builder to include score
threshold
* Linting, continued
* Update Pipfile
* Refactor JsonAnalysisBuilder to support language
parameter
* Fix not camel case in TabularAnalysisBuilder
* Add score_threshold parameter to AnalysisBuilder
* Refactor JSON analysis builder to gain consistency
* Remove low score results in JsonAnalysisBuilder
* Add tests to json analysis with score threshold
* Fix bug in JSON analysis to update map with
nested_mappings
* Fix bug in JSON analysis to take only entity types
* Fix typos in test anl json names and assert values
* Update build-structured.yml
* Create __init__.py
* Type hint fix python <3.10, loggger typo
* Update setup.py
* PR comments variety
* further pr comments
* readme, refactor score, refactor tabular analysis
* Update test_analysis_builder.py
* lint
---------
Co-authored-by: Omri Mendels <omri374@users.noreply.github.com>
Co-authored-by: Sharon Hart <sharonh.dev@gmail.com>
Co-authored-by: enrique.botia <enrique.botia@netzima.com>