4308339b4d
* [structure] structuring main repo * [milestone1][msread][cli tool] temp commit * fix autofac * fix storage service * remove extra files * parser not working * fix Parsing * Working version * updated configs file schema * [cli tool][ms-read] restructuring * [cli-tool] Added support for exception handling * [cli tool] restructuring as cli app * [cli tool][commands] integrated CommandLineUtil cli framework * [cli-tool] fix MsReadUnauthorizedException * [cli tool] restructured config file and added some config commands * [cli tool] Add remaining config commands * [cli tool] Add Parsing logs * [cli tool] refactored storage factory * [cli tool] abstracted logging * [cli tool] Make parser options required * [cli tool] created universal exception handler * [cli tool] remove extra files * [cli tool] fixed configs file dir, renamed project * [cli tool] namespace refactoring, renamed tool * [cli tool] cli commands -> added descriptions * [cli tool] restructured cli tool directory * [cli tool] added readme * [cli tool] initialize configs.json if doesn't exist * [cli tool] making cli config keys consistent with configs file * [cli tool] [chunker] Extended architecture to add chunker * fix formatting * [cli tool][chunker] fix bug in ParserServiceController and add PageChunker basic implementation * [cli tool] Add support for MsReadChunker (by page and character limit) * [cli tool][prediction] added support for prediction configs and command * [cli tool][prediction] updated controllers structure * [cli tool][prediction] added support for prediction command (pending prediction service) * [cli tool][prediction] added support for CustomText prediction service * [cli tool] [refactor] restructure models * [cli tool] [refactor] prediction http models * [cli tool] [refactor] namespace refactoring * [cli tool][prediction] added format for prediction result * [cli tool][tests] added tests project to solution * [cli tool][tests] added MSRead connection test * [cli tool][tests] added MSRead connection test * [cli-tool][tests] Add Storage Service tests * [cli-tool][chunker] Added support for character limit in chunking by page * [cli-tool][tests] Add Chunker Service tests * [cli-tool] Add exception codes * [cli-tool][tests] Add Utilities class * [cli-tool][tests] Update storage tests with new exception checking * [cli-tool][tests] Add Storage Factory tests * [cli-tool][tests] fix type check in Utilities.AssertThrows * [cli-tool][tests] added MSRead parser tests * [cli-tool][tests] Remove exception codes and utilities class * [clit-tool][tests] minor fixes to storage tests * [cli-tool][chunker] Add chunk command * [cli-tool] Updated cli tool commands in readme * [cli-tool][tests] Added ConfigServiceControllerTest * [cli-tool][tests] added msread parser tests * [cli-tool][prediction] Reactor prediction service and add exception handling * [cli tool][PR reviews] complied with reviews 1 * [cli tool][PR reviews] Use Tasks instead of Parallel.ForEach in ChunkerServiceController * [cli-tool][PR reviews] Create ChunkInfoHelper class * [cli-tool][PR reviews] Update UnauthorizedFileAccessException to use string interpolation instead of concatenation Co-authored-by: Mohamed Abbas <moaba@microsoft.com> * [cli-tool][PR reviews] throw NotSupportedException instead of returning null * [cli-tool][PR reviews] Change Custom Text Response Status to snake case * [cli-tool][PR reviews] Move chunk summary creation logic to ChunkInfoHelper * [cli-tool][PR reviews] MsReadChunker handled special cases, added comments and updated tests * [cli-tool][PR reviews] Add lock to LogParsingResult function * [cli-tool][PR reviews] Create a new blob container for each test * [cli-tool][PR reviews] remove json enum converter * [cli-tool][PR reviews] add failure reason for each file when logging operation result * [cli-tool][PR reviews] move public classes to separate files * [cli-tool][PR reviews] remove values from configs.json * [cli-tool][PR review] Formatted files to comply with conventions * [cli-tool][PR review] Fix code styling and conventions * [cli-tool][devops] add azure pipelines yaml file to run tests with PR * [cli-tool][PR reviews] write enum values in separate lines * [cli-tool][PR reviews] reword error message and remove commented code * [cli-tool][PR reviews] use concurrent data structures in service controllers * [cli-tool][PR reviews] add prediction failure reason to exception * [cli-tool][PR reviews] change storage service methods to async * [PR reviews] fix gitignore * [PR reviews] fix Env variables names * [PR reviews] change methods to be async * [PR reviews] refactor populating headers in httpHandler * [PR reviews] add message to NotSupportedException * [PR reviews] add timeout for CustomTextPrediction * [PR reviews] remove properties/ from gitignore * [cli-tool][pipelines] fix azure-pipelines.yml to run tests correctly * [PR reviews] code formatiing * [PR reviews] add await to async method in LocalStorageServiceTest * [PR reviews] use CalculateMaxLineLength function in MsReadChunkerService * [PR reviews] Add a separate method for boundingBox logic in MsReadChunkerService * [cli-tool][tests] added tests * [milestone1][msread][cli tool] temp commit * fix autofac * fix storage service * remove extra files * parser not working * fix Parsing * Working version * updated configs file schema * [cli tool][ms-read] restructuring * [cli-tool] Added support for exception handling * [cli tool] restructuring as cli app * [cli tool][commands] integrated CommandLineUtil cli framework * [cli-tool] fix MsReadUnauthorizedException * [cli tool] restructured config file and added some config commands * [cli tool] Add remaining config commands * [cli tool] Add Parsing logs * [cli tool] refactored storage factory * [cli tool] abstracted logging * [cli tool] Make parser options required * [cli tool] created universal exception handler * [cli tool] remove extra files * [cli tool] fixed configs file dir, renamed project * [cli tool] namespace refactoring, renamed tool * [cli tool] cli commands -> added descriptions * [cli tool] restructured cli tool directory * [cli tool] added readme * [cli tool] initialize configs.json if doesn't exist * [cli tool] making cli config keys consistent with configs file * [cli tool] [chunker] Extended architecture to add chunker * fix formatting * [cli tool][chunker] fix bug in ParserServiceController and add PageChunker basic implementation * [cli tool] Add support for MsReadChunker (by page and character limit) * [cli tool][prediction] added support for prediction configs and command * [cli tool][prediction] updated controllers structure * [cli tool][prediction] added support for prediction command (pending prediction service) * [cli tool][prediction] added support for CustomText prediction service * [cli tool] [refactor] restructure models * [cli tool] [refactor] prediction http models * [cli tool] [refactor] namespace refactoring * [cli tool][prediction] added format for prediction result * [cli tool][tests] added tests project to solution * [cli tool][tests] added MSRead connection test * [cli tool][tests] added MSRead connection test * [cli-tool][tests] Add Storage Service tests * [cli-tool][chunker] Added support for character limit in chunking by page * [cli-tool][tests] Add Chunker Service tests * [cli-tool] Add exception codes * [cli-tool][tests] Add Utilities class * [cli-tool][tests] Update storage tests with new exception checking * [cli-tool][tests] Add Storage Factory tests * [cli-tool][tests] fix type check in Utilities.AssertThrows * [cli-tool][tests] added MSRead parser tests * [cli-tool][tests] Remove exception codes and utilities class * [clit-tool][tests] minor fixes to storage tests * [cli-tool][chunker] Add chunk command * [cli-tool] Updated cli tool commands in readme * [cli-tool][tests] Added ConfigServiceControllerTest * [cli-tool][tests] added msread parser tests * [cli-tool][prediction] Reactor prediction service and add exception handling * [cli tool][PR reviews] complied with reviews 1 * [cli tool][PR reviews] Use Tasks instead of Parallel.ForEach in ChunkerServiceController * [cli-tool][PR reviews] Create ChunkInfoHelper class * [cli-tool][PR reviews] Update UnauthorizedFileAccessException to use string interpolation instead of concatenation Co-authored-by: Mohamed Abbas <moaba@microsoft.com> * [cli-tool][PR reviews] throw NotSupportedException instead of returning null * [cli-tool][PR reviews] Change Custom Text Response Status to snake case * [cli-tool][PR reviews] Move chunk summary creation logic to ChunkInfoHelper * [cli-tool][PR reviews] MsReadChunker handled special cases, added comments and updated tests * [cli-tool][PR reviews] Add lock to LogParsingResult function * [cli-tool][PR reviews] Create a new blob container for each test * [cli-tool][PR reviews] remove json enum converter * [cli-tool][PR reviews] add failure reason for each file when logging operation result * [cli-tool][PR reviews] move public classes to separate files * [cli-tool][PR reviews] remove values from configs.json * [cli-tool][PR review] Formatted files to comply with conventions * [cli-tool][PR review] Fix code styling and conventions * [cli-tool][devops] add azure pipelines yaml file to run tests with PR * [cli-tool][PR reviews] write enum values in separate lines * [cli-tool][PR reviews] reword error message and remove commented code * [cli-tool][PR reviews] use concurrent data structures in service controllers * [cli-tool][PR reviews] add prediction failure reason to exception * [cli-tool][PR reviews] change storage service methods to async * [cli tool][testing] added prediction service integration tests * [cli-tool][tests] finished unit tests for prediction service * [cli-tool] change framework version from 3.1 to 2.1 to run azure pipelines * [cli-tool][tests] Renamed env variables * [cli-tool] change dotnet version to 3.1 * [cli-tool] change configuration file location to use the current directory * [cli-tool] change dotnet version in tests project to 3.1 * [cli-tool] change netcore version from 2.1 to 3.1 * Azure pipelines (#5) Update azure-pipelines.yml for Azure Pipelines * Revert "Azure pipelines (#5)" This reverts commit 5cccf4b4662fc07776c07bf58f065a91b3846ad4. * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Revert "[cli-tool] change configuration file location to use the current directory" This reverts commit 35e33adbd6d1da33078d9a8902fd365c69b702bb. * [PR reviews] fix gitignore * [PR reviews] fix Env variables names * [PR reviews] change methods to be async * [PR reviews] refactor populating headers in httpHandler * [PR reviews] add message to NotSupportedException * [PR reviews] add timeout for CustomTextPrediction * [PR reviews] remove properties/ from gitignore * [cli-tool][pipelines] fix azure-pipelines.yml to run tests correctly Co-authored-by: Mohamed Shaban <a-moshab@microsoft.com> Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> Co-authored-by: Mohamed Abbas <moaba@microsoft.com> * Cli tool load configs file (#11) * [milestone1][msread][cli tool] temp commit * fix autofac * fix storage service * remove extra files * parser not working * fix Parsing * Working version * updated configs file schema * [cli tool][ms-read] restructuring * [cli-tool] Added support for exception handling * [cli tool] restructuring as cli app * [cli tool][commands] integrated CommandLineUtil cli framework * [cli-tool] fix MsReadUnauthorizedException * [cli tool] restructured config file and added some config commands * [cli tool] Add remaining config commands * [cli tool] Add Parsing logs * [cli tool] refactored storage factory * [cli tool] abstracted logging * [cli tool] Make parser options required * [cli tool] created universal exception handler * [cli tool] remove extra files * [cli tool] fixed configs file dir, renamed project * [cli tool] namespace refactoring, renamed tool * [cli tool] cli commands -> added descriptions * [cli tool] restructured cli tool directory * [cli tool] added readme * [cli tool] initialize configs.json if doesn't exist * [cli tool] making cli config keys consistent with configs file * [cli tool] [chunker] Extended architecture to add chunker * fix formatting * [cli tool][chunker] fix bug in ParserServiceController and add PageChunker basic implementation * [cli tool] Add support for MsReadChunker (by page and character limit) * [cli tool][prediction] added support for prediction configs and command * [cli tool][prediction] updated controllers structure * [cli tool][prediction] added support for prediction command (pending prediction service) * [cli tool][prediction] added support for CustomText prediction service * [cli tool] [refactor] restructure models * [cli tool] [refactor] prediction http models * [cli tool] [refactor] namespace refactoring * [cli tool][prediction] added format for prediction result * [cli tool][tests] added tests project to solution * [cli tool][tests] added MSRead connection test * [cli tool][tests] added MSRead connection test * [cli-tool][tests] Add Storage Service tests * [cli-tool][chunker] Added support for character limit in chunking by page * [cli-tool][tests] Add Chunker Service tests * [cli-tool] Add exception codes * [cli-tool][tests] Add Utilities class * [cli-tool][tests] Update storage tests with new exception checking * [cli-tool][tests] Add Storage Factory tests * [cli-tool][tests] fix type check in Utilities.AssertThrows * [cli-tool][tests] added MSRead parser tests * [cli-tool][tests] Remove exception codes and utilities class * [clit-tool][tests] minor fixes to storage tests * [cli-tool][chunker] Add chunk command * [cli-tool] Updated cli tool commands in readme * [cli-tool][tests] Added ConfigServiceControllerTest * [cli-tool][tests] added msread parser tests * [cli-tool][prediction] Reactor prediction service and add exception handling * [cli tool][PR reviews] complied with reviews 1 * [cli tool][PR reviews] Use Tasks instead of Parallel.ForEach in ChunkerServiceController * [cli-tool][PR reviews] Create ChunkInfoHelper class * [cli-tool][PR reviews] Update UnauthorizedFileAccessException to use string interpolation instead of concatenation Co-authored-by: Mohamed Abbas <moaba@microsoft.com> * [cli-tool][PR reviews] throw NotSupportedException instead of returning null * [cli-tool][PR reviews] Change Custom Text Response Status to snake case * [cli-tool][PR reviews] Move chunk summary creation logic to ChunkInfoHelper * [cli-tool][PR reviews] MsReadChunker handled special cases, added comments and updated tests * [cli-tool][PR reviews] Add lock to LogParsingResult function * [cli-tool][PR reviews] Create a new blob container for each test * [cli-tool][PR reviews] remove json enum converter * [cli-tool][PR reviews] add failure reason for each file when logging operation result * [cli-tool][PR reviews] move public classes to separate files * [cli-tool][PR reviews] remove values from configs.json * [cli-tool][PR review] Formatted files to comply with conventions * [cli-tool][PR review] Fix code styling and conventions * [cli-tool][devops] add azure pipelines yaml file to run tests with PR * [cli-tool][PR reviews] write enum values in separate lines * [cli-tool][PR reviews] reword error message and remove commented code * [cli-tool][PR reviews] use concurrent data structures in service controllers * [cli-tool][PR reviews] add prediction failure reason to exception * [cli-tool][PR reviews] change storage service methods to async * [cli tool][testing] added prediction service integration tests * [cli-tool][tests] finished unit tests for prediction service * [cli-tool] change framework version from 3.1 to 2.1 to run azure pipelines * [cli-tool][tests] Renamed env variables * [cli-tool] change dotnet version to 3.1 * [cli-tool] change configuration file location to use the current directory * [cli-tool] change dotnet version in tests project to 3.1 * [cli-tool] change netcore version from 2.1 to 3.1 * Azure pipelines (#5) Update azure-pipelines.yml for Azure Pipelines * Revert "Azure pipelines (#5)" This reverts commit 5cccf4b4662fc07776c07bf58f065a91b3846ad4. * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Revert "[cli-tool] change configuration file location to use the current directory" This reverts commit 35e33adbd6d1da33078d9a8902fd365c69b702bb. * [cli-tool] change framework version from 3.1 to 2.1 to run azure pipelines * [cli-tool] change netcore version from 2.1 to 3.1 * Azure pipelines (#5) Update azure-pipelines.yml for Azure Pipelines * Revert "Azure pipelines (#5)" This reverts commit 5cccf4b4662fc07776c07bf58f065a91b3846ad4. * [cli-tool] add command to load configs from file * [cli-tool] add FileNotFoundException * [cli-tool][tests] add tests for loading configs from file * [cli-tool][configs] rebased latest PR * [PR reviews] fix gitignore * [PR reviews] fix Env variables names * [PR reviews] change methods to be async * [PR reviews] refactor populating headers in httpHandler * [PR reviews] add message to NotSupportedException * [PR reviews] add timeout for CustomTextPrediction * [PR reviews] remove properties/ from gitignore * [cli-tool][pipelines] fix azure-pipelines.yml to run tests correctly * [cli-tool] make config load command async * Update azure-pipelines.yml for Azure Pipelines (#12) * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * [cli tool][load configs][pr reviews] making FileExists method async Co-authored-by: Mohamed Shaban <a-moshab@microsoft.com> Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> Co-authored-by: Mohamed Abbas <moaba@microsoft.com> * Cli tool text analytics (#10) * [milestone1][msread][cli tool] temp commit * fix autofac * fix storage service * remove extra files * parser not working * fix Parsing * Working version * updated configs file schema * [cli tool][ms-read] restructuring * [cli-tool] Added support for exception handling * [cli tool] restructuring as cli app * [cli tool][commands] integrated CommandLineUtil cli framework * [cli-tool] fix MsReadUnauthorizedException * [cli tool] restructured config file and added some config commands * [cli tool] Add remaining config commands * [cli tool] Add Parsing logs * [cli tool] refactored storage factory * [cli tool] abstracted logging * [cli tool] Make parser options required * [cli tool] created universal exception handler * [cli tool] remove extra files * [cli tool] fixed configs file dir, renamed project * [cli tool] namespace refactoring, renamed tool * [cli tool] cli commands -> added descriptions * [cli tool] restructured cli tool directory * [cli tool] added readme * [cli tool] initialize configs.json if doesn't exist * [cli tool] making cli config keys consistent with configs file * [cli tool] [chunker] Extended architecture to add chunker * fix formatting * [cli tool][chunker] fix bug in ParserServiceController and add PageChunker basic implementation * [cli tool] Add support for MsReadChunker (by page and character limit) * [cli tool][prediction] added support for prediction configs and command * [cli tool][prediction] updated controllers structure * [cli tool][prediction] added support for prediction command (pending prediction service) * [cli tool][prediction] added support for CustomText prediction service * [cli tool] [refactor] restructure models * [cli tool] [refactor] prediction http models * [cli tool] [refactor] namespace refactoring * [cli tool][prediction] added format for prediction result * [cli tool][tests] added tests project to solution * [cli tool][tests] added MSRead connection test * [cli tool][tests] added MSRead connection test * [cli-tool][tests] Add Storage Service tests * [cli-tool][chunker] Added support for character limit in chunking by page * [cli-tool][tests] Add Chunker Service tests * [cli-tool] Add exception codes * [cli-tool][tests] Add Utilities class * [cli-tool][tests] Update storage tests with new exception checking * [cli-tool][tests] Add Storage Factory tests * [cli-tool][tests] fix type check in Utilities.AssertThrows * [cli-tool][tests] added MSRead parser tests * [cli-tool][tests] Remove exception codes and utilities class * [clit-tool][tests] minor fixes to storage tests * [cli-tool][chunker] Add chunk command * [cli-tool] Updated cli tool commands in readme * [cli-tool][tests] Added ConfigServiceControllerTest * [cli-tool][tests] added msread parser tests * [cli-tool][prediction] Reactor prediction service and add exception handling * [cli tool][PR reviews] complied with reviews 1 * [cli tool][PR reviews] Use Tasks instead of Parallel.ForEach in ChunkerServiceController * [cli-tool][PR reviews] Create ChunkInfoHelper class * [cli-tool][PR reviews] Update UnauthorizedFileAccessException to use string interpolation instead of concatenation Co-authored-by: Mohamed Abbas <moaba@microsoft.com> * [cli-tool][PR reviews] throw NotSupportedException instead of returning null * [cli-tool][PR reviews] Change Custom Text Response Status to snake case * [cli-tool][PR reviews] Move chunk summary creation logic to ChunkInfoHelper * [cli-tool][PR reviews] MsReadChunker handled special cases, added comments and updated tests * [cli-tool][PR reviews] Add lock to LogParsingResult function * [cli-tool][PR reviews] Create a new blob container for each test * [cli-tool][PR reviews] remove json enum converter * [cli-tool][PR reviews] add failure reason for each file when logging operation result * [cli-tool][PR reviews] move public classes to separate files * [cli-tool][PR reviews] remove values from configs.json * [cli-tool][PR review] Formatted files to comply with conventions * [cli-tool][PR review] Fix code styling and conventions * [cli-tool][devops] add azure pipelines yaml file to run tests with PR * [cli-tool][PR reviews] write enum values in separate lines * [cli-tool][PR reviews] reword error message and remove commented code * [cli-tool][PR reviews] use concurrent data structures in service controllers * [cli-tool][PR reviews] add prediction failure reason to exception * [cli-tool][PR reviews] change storage service methods to async * [cli tool][testing] added prediction service integration tests * [cli-tool][tests] finished unit tests for prediction service * [cli-tool] change framework version from 3.1 to 2.1 to run azure pipelines * [cli-tool][tests] Renamed env variables * [cli-tool] change dotnet version to 3.1 * [cli-tool] change configuration file location to use the current directory * [cli-tool] change dotnet version in tests project to 3.1 * [cli-tool] change netcore version from 2.1 to 3.1 * Azure pipelines (#5) Update azure-pipelines.yml for Azure Pipelines * Revert "Azure pipelines (#5)" This reverts commit 5cccf4b4662fc07776c07bf58f065a91b3846ad4. * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Revert "[cli-tool] change configuration file location to use the current directory" This reverts commit 35e33adbd6d1da33078d9a8902fd365c69b702bb. * [cli-tool] change framework version from 3.1 to 2.1 to run azure pipelines * [cli-tool] change netcore version from 2.1 to 3.1 * Azure pipelines (#5) Update azure-pipelines.yml for Azure Pipelines * Revert "Azure pipelines (#5)" This reverts commit 5cccf4b4662fc07776c07bf58f065a91b3846ad4. * [cli-tool] add command to load configs from file * [cli-tool] add FileNotFoundException * [cli-tool][tests] add tests for loading configs from file * [cli-tool][configs] rebased latest PR * [PR reviews] fix gitignore * [PR reviews] fix Env variables names * [PR reviews] change methods to be async * [PR reviews] refactor populating headers in httpHandler * [PR reviews] add message to NotSupportedException * [PR reviews] add timeout for CustomTextPrediction * [PR reviews] remove properties/ from gitignore * [cli-tool][pipelines] fix azure-pipelines.yml to run tests correctly * [cli-tool] make config load command async * Update azure-pipelines.yml for Azure Pipelines (#12) * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * [cli tool][load configs][pr reviews] making FileExists method async * [cli-tool][text analytics] added support for text analytics configs * [cli tool][text analytics] added text analytics command in view layer * [cli tool][text analytics] added text analytics commands, controller, service * [cli-tool][text analytics] text analytics command working * [cli-tool][text analytics] Use config keys for json property names * [cli-tool][tests] add integration tests for TextAnalyticsPredictionService * [cli tool][text analytics] modified response object * [cli tool][text analytics] handled exceeding prediction limits * [cli-tool][chunker] Add charLimit check and fix pageCount bug * [cli-tool][chunker] refactor chunking logic to smaller functions * [cli-tool][text analytics] Handle text analytics exceptions * [cli-tool] update readme to include textanalytics * [cli-tool][text analytics] add TA env variables for testing to azure pipeline * [cli-tool][testing] remove versionId from config test Co-authored-by: Mohamed Shaban <a-moshab@microsoft.com> Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> Co-authored-by: Mohamed Abbas <moaba@microsoft.com> * Cli tool project restructure (#13) * [milestone1][msread][cli tool] temp commit * fix autofac * fix storage service * remove extra files * parser not working * fix Parsing * Working version * updated configs file schema * [cli tool][ms-read] restructuring * [cli-tool] Added support for exception handling * [cli tool] restructuring as cli app * [cli tool][commands] integrated CommandLineUtil cli framework * [cli-tool] fix MsReadUnauthorizedException * [cli tool] restructured config file and added some config commands * [cli tool] Add remaining config commands * [cli tool] Add Parsing logs * [cli tool] refactored storage factory * [cli tool] abstracted logging * [cli tool] Make parser options required * [cli tool] created universal exception handler * [cli tool] remove extra files * [cli tool] fixed configs file dir, renamed project * [cli tool] namespace refactoring, renamed tool * [cli tool] cli commands -> added descriptions * [cli tool] restructured cli tool directory * [cli tool] added readme * [cli tool] initialize configs.json if doesn't exist * [cli tool] making cli config keys consistent with configs file * [cli tool] [chunker] Extended architecture to add chunker * fix formatting * [cli tool][chunker] fix bug in ParserServiceController and add PageChunker basic implementation * [cli tool] Add support for MsReadChunker (by page and character limit) * [cli tool][prediction] added support for prediction configs and command * [cli tool][prediction] updated controllers structure * [cli tool][prediction] added support for prediction command (pending prediction service) * [cli tool][prediction] added support for CustomText prediction service * [cli tool] [refactor] restructure models * [cli tool] [refactor] prediction http models * [cli tool] [refactor] namespace refactoring * [cli tool][prediction] added format for prediction result * [cli tool][tests] added tests project to solution * [cli tool][tests] added MSRead connection test * [cli tool][tests] added MSRead connection test * [cli-tool][tests] Add Storage Service tests * [cli-tool][chunker] Added support for character limit in chunking by page * [cli-tool][tests] Add Chunker Service tests * [cli-tool] Add exception codes * [cli-tool][tests] Add Utilities class * [cli-tool][tests] Update storage tests with new exception checking * [cli-tool][tests] Add Storage Factory tests * [cli-tool][tests] fix type check in Utilities.AssertThrows * [cli-tool][tests] added MSRead parser tests * [cli-tool][tests] Remove exception codes and utilities class * [clit-tool][tests] minor fixes to storage tests * [cli-tool][chunker] Add chunk command * [cli-tool] Updated cli tool commands in readme * [cli-tool][tests] Added ConfigServiceControllerTest * [cli-tool][tests] added msread parser tests * [cli-tool][prediction] Reactor prediction service and add exception handling * [cli tool][PR reviews] complied with reviews 1 * [cli tool][PR reviews] Use Tasks instead of Parallel.ForEach in ChunkerServiceController * [cli-tool][PR reviews] Create ChunkInfoHelper class * [cli-tool][PR reviews] Update UnauthorizedFileAccessException to use string interpolation instead of concatenation Co-authored-by: Mohamed Abbas <moaba@microsoft.com> * [cli-tool][PR reviews] throw NotSupportedException instead of returning null * [cli-tool][PR reviews] Change Custom Text Response Status to snake case * [cli-tool][PR reviews] Move chunk summary creation logic to ChunkInfoHelper * [cli-tool][PR reviews] MsReadChunker handled special cases, added comments and updated tests * [cli-tool][PR reviews] Add lock to LogParsingResult function * [cli-tool][PR reviews] Create a new blob container for each test * [cli-tool][PR reviews] remove json enum converter * [cli-tool][PR reviews] add failure reason for each file when logging operation result * [cli-tool][PR reviews] move public classes to separate files * [cli-tool][PR reviews] remove values from configs.json * [cli-tool][PR review] Formatted files to comply with conventions * [cli-tool][PR review] Fix code styling and conventions * [cli-tool][devops] add azure pipelines yaml file to run tests with PR * [cli-tool][PR reviews] write enum values in separate lines * [cli-tool][PR reviews] reword error message and remove commented code * [cli-tool][PR reviews] use concurrent data structures in service controllers * [cli-tool][PR reviews] add prediction failure reason to exception * [cli-tool][PR reviews] change storage service methods to async * [cli tool][testing] added prediction service integration tests * [cli-tool][tests] finished unit tests for prediction service * [cli-tool] change framework version from 3.1 to 2.1 to run azure pipelines * [cli-tool][tests] Renamed env variables * [cli-tool] change dotnet version to 3.1 * [cli-tool] change configuration file location to use the current directory * [cli-tool] change dotnet version in tests project to 3.1 * [cli-tool] change netcore version from 2.1 to 3.1 * Azure pipelines (#5) Update azure-pipelines.yml for Azure Pipelines * Revert "Azure pipelines (#5)" This reverts commit 5cccf4b4662fc07776c07bf58f065a91b3846ad4. * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Revert "[cli-tool] change configuration file location to use the current directory" This reverts commit 35e33adbd6d1da33078d9a8902fd365c69b702bb. * [cli-tool] change framework version from 3.1 to 2.1 to run azure pipelines * [cli-tool] change netcore version from 2.1 to 3.1 * Azure pipelines (#5) Update azure-pipelines.yml for Azure Pipelines * Revert "Azure pipelines (#5)" This reverts commit 5cccf4b4662fc07776c07bf58f065a91b3846ad4. * [cli-tool] add command to load configs from file * [cli-tool] add FileNotFoundException * [cli-tool][tests] add tests for loading configs from file * [cli-tool][configs] rebased latest PR * [cli-tool][text analytics] added support for text analytics configs * [PR reviews] fix gitignore * [PR reviews] fix Env variables names * [PR reviews] change methods to be async * [PR reviews] refactor populating headers in httpHandler * [PR reviews] add message to NotSupportedException * [PR reviews] add timeout for CustomTextPrediction * [PR reviews] remove properties/ from gitignore * [cli-tool][pipelines] fix azure-pipelines.yml to run tests correctly * [cli tool][text analytics] added text analytics command in view layer * [cli tool][text analytics] added text analytics commands, controller, service * [cli-tool][text analytics] text analytics command working * [cli-tool][text analytics] Use config keys for json property names * [cli-tool] make config load command async * [cli-tool][tests] add integration tests for TextAnalyticsPredictionService * [cli tool][text analytics] modified response object * [cli tool][text analytics] handled exceeding prediction limits * [cli-tool][chunker] Add charLimit check and fix pageCount bug * [cli-tool][chunker] refactor chunking logic to smaller functions * [cli-tool][text analytics] Handle text analytics exceptions * [cli tool][project restructure] created the new sub projects * [cli tool][project restructure][view layer] refactor namespaces, class names * [cli tool][project restructure][view layer] added configs folder and classes * [cli tool][project restructure][definitions layer] update namespace * [cli tool][project restructure][definitions layer] update models namespaces * [cli tool][project restructure][definitions layer] update enums namespaces * [cli tool][project restructure][definitions layer] update exceptions namespaces * [cli tool][project restructure][definitions layer] update APIS namespaces * [cli tool][project restructure][core layer] update controllers namespaces * [cli tool][project restructure][core layer] update factories namespaces * [cli tool][project restructure][core layer] update services namespaces * [cli tool][project restructure][core layer] update helpers namespaces * [cli tool][project restructure][tests layer] update namespaces * [cli tool][project restructure] projecting building * [cli tool][project restructure] bug fix - commands * [cli-tool][formatting] reoder using and remove unused * [cli-tool] update commands in readme * [cli-tool][project restructure] predict command handles both text analytics and custom text * [cli-tool][restructure] update paths in Azure Pipeline * [cli-tool][restructure] get changes from cli-tool-text-analytics branch * [cli-tool] merge changes made on load-configs branch * Update azure-pipelines.yml with new solution structure Co-authored-by: Mohamed Shaban <a-moshab@microsoft.com> Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> Co-authored-by: Mohamed Abbas <moaba@microsoft.com> * [cli-tool]remove utilities command (#18) Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * [cli tool] Add Folder publish profile (#24) Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * [cli tool][configs] rename prediction to customtext (#26) Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * [batch testing] add model evaluation nuget (#19) * [batch testing] add model evaluation nuget * [nuget] handle division by zero * [nuget] add support for multi-class classification (cherry picked from commit a0ff64c35f481b1d0ccf32c8211a864a89a2fd15) * [nuget] check only null and not empty lists * [nuget] Add test for wrong prediction scenario * [nuget] Add child entities to test * [batch test nuget][PR reviews] fix intent and entity initializaton * [batch test nuget][PR reviews] refactored intent and utterances names to class and query to be more generic * [batch test nuget][PR reviews] modified input data structure to be HashSet in "AggregateClassificationStats()" in EvaluationService * [batch test nuget][PR reviews] refactored entity stats lookup * [batch test nuget][PR reviews] updated model type for unknown models * [nuget] remove coverlet.collector dependency from test project * [nuget] replace select with foreach in EvaluationService * [nuget] change IEnumerable to IReadOnlyList * [nuget] Add hierarchy separator to constants * [nuget] code formatting * [nuget] remove unused variable * [nuget] add documentation and code clean up * [nuget] rename EvaluationController to EvaluationOrchestrator Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> Co-authored-by: Mohamed Shaban <a-moshab@microsoft.com> * Chunker improvements (#23) * [cli tool][chunker] improve paragraph detection heurisitcs * [cli tool][chunker] improve paragraph detection heurisitcs * [cli tool][chunker] use end of line space instead of line length * [cli tool][chunker] add new line after paragraph * [cli tool][chunker] use median vertical space for comparison instead of previous * [cli tool][chunker] use difference between top and top for vertical spacing * [cli tool][chunker] fix bug when appending to last page Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * Evaluate command configs (#21) * [cli tool][evaluation][configs] added evaluation config model (cherry picked from commit dab20a0ab86c3ba13914fca00530d24df0f00671) * [cli tool][evaluation][configs] rename prediction model (cherry picked from commit 17ca04409e4ea506942c83f7fde35411e0bebb58) * [cli tool][evaluation][configs] added tests for new config model (cherry picked from commit 5e04fa466f15d00ae80f786273a3468440a2961c) * [cli tool][configs] rename json field and fix formatiing Co-authored-by: Mohamed Shaban <a-moshab@microsoft.com> Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * [cli tool][config] change config file to kebab case (#27) Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * fix labeled-examples-app key (#28) Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * [cli tool] removed unnecessary files * [cli tool][configs] custom text configs contains both prediction and authoring. Evaluation configs removed (#30) Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * Add default char-limit for chunker (#32) Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * Pipeline skip integration tests (#35) * rename test class to unit test * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * Evaluate Command (#29) * [milestone1][msread][cli tool] temp commit * fix autofac * fix storage service * remove extra files * parser not working * fix Parsing * Working version * updated configs file schema * [cli tool][ms-read] restructuring * [cli-tool] Added support for exception handling * [cli tool] restructuring as cli app * [cli tool][commands] integrated CommandLineUtil cli framework * [cli-tool] fix MsReadUnauthorizedException * [cli tool] restructured config file and added some config commands * [cli tool] Add remaining config commands * [cli tool] Add Parsing logs * [cli tool] refactored storage factory * [cli tool] abstracted logging * [cli tool] Make parser options required * [cli tool] created universal exception handler * [cli tool] remove extra files * [cli tool] fixed configs file dir, renamed project * [cli tool] namespace refactoring, renamed tool * [cli tool] cli commands -> added descriptions * [cli tool] restructured cli tool directory * [cli tool] added readme * [cli tool] initialize configs.json if doesn't exist * [cli tool] making cli config keys consistent with configs file * [cli tool] [chunker] Extended architecture to add chunker * fix formatting * [cli tool][chunker] fix bug in ParserServiceController and add PageChunker basic implementation * [cli tool] Add support for MsReadChunker (by page and character limit) * [cli tool][prediction] added support for prediction configs and command * [cli tool][prediction] updated controllers structure * [cli tool][prediction] added support for prediction command (pending prediction service) * [cli tool][prediction] added support for CustomText prediction service * [cli tool] [refactor] restructure models * [cli tool] [refactor] prediction http models * [cli tool] [refactor] namespace refactoring * [cli tool][prediction] added format for prediction result * [cli tool][tests] added tests project to solution * [cli tool][tests] added MSRead connection test * [cli tool][tests] added MSRead connection test * [cli-tool][tests] Add Storage Service tests * [cli-tool][chunker] Added support for character limit in chunking by page * [cli-tool][tests] Add Chunker Service tests * [cli-tool] Add exception codes * [cli-tool][tests] Add Utilities class * [cli-tool][tests] Update storage tests with new exception checking * [cli-tool][tests] Add Storage Factory tests * [cli-tool][tests] fix type check in Utilities.AssertThrows * [cli-tool][tests] added MSRead parser tests * [cli-tool][tests] Remove exception codes and utilities class * [clit-tool][tests] minor fixes to storage tests * [cli-tool][chunker] Add chunk command * [cli-tool] Updated cli tool commands in readme * [cli-tool][tests] Added ConfigServiceControllerTest * [cli-tool][tests] added msread parser tests * [cli-tool][prediction] Reactor prediction service and add exception handling * [cli tool][PR reviews] complied with reviews 1 * [cli tool][PR reviews] Use Tasks instead of Parallel.ForEach in ChunkerServiceController * [cli-tool][PR reviews] Create ChunkInfoHelper class * [cli-tool][PR reviews] Update UnauthorizedFileAccessException to use string interpolation instead of concatenation Co-authored-by: Mohamed Abbas <moaba@microsoft.com> * [cli-tool][PR reviews] throw NotSupportedException instead of returning null * [cli-tool][PR reviews] Change Custom Text Response Status to snake case * [cli-tool][PR reviews] Move chunk summary creation logic to ChunkInfoHelper * [cli-tool][PR reviews] MsReadChunker handled special cases, added comments and updated tests * [cli-tool][PR reviews] Add lock to LogParsingResult function * [cli-tool][PR reviews] Create a new blob container for each test * [cli-tool][PR reviews] remove json enum converter * [cli-tool][PR reviews] add failure reason for each file when logging operation result * [cli-tool][PR reviews] move public classes to separate files * [cli-tool][PR reviews] remove values from configs.json * [cli-tool][PR review] Formatted files to comply with conventions * [cli-tool][PR review] Fix code styling and conventions * [cli-tool][devops] add azure pipelines yaml file to run tests with PR * [cli-tool][PR reviews] write enum values in separate lines * [cli-tool][PR reviews] reword error message and remove commented code * [cli-tool][PR reviews] use concurrent data structures in service controllers * [cli-tool][PR reviews] add prediction failure reason to exception * [cli-tool][PR reviews] change storage service methods to async * [cli tool][testing] added prediction service integration tests * [cli-tool][tests] finished unit tests for prediction service * [cli-tool] change framework version from 3.1 to 2.1 to run azure pipelines * [cli-tool][tests] Renamed env variables * [cli-tool] change dotnet version to 3.1 * [cli-tool] change configuration file location to use the current directory * [cli-tool] change dotnet version in tests project to 3.1 * [cli-tool] change netcore version from 2.1 to 3.1 * Azure pipelines (#5) Update azure-pipelines.yml for Azure Pipelines * Revert "Azure pipelines (#5)" This reverts commit 5cccf4b4662fc07776c07bf58f065a91b3846ad4. * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Update azure-pipelines.yml for Azure Pipelines * Revert "[cli-tool] change configuration file location to use the current directory" This reverts commit 35e33adbd6d1da33078d9a8902fd365c69b702bb. * [cli-tool] change framework version from 3.1 to 2.1 to run azure pipelines * [cli-tool] change netcore version from 2.1 to 3.1 * Azure pipelines (#5) Update azure-pipelines.yml for Azure Pipelines * Revert "Azure pipelines (#5)" This reverts commit 5cccf4b4662fc07776c07bf58f065a91b3846ad4. * [cli-tool] add command to load configs from file * [cli-tool] add FileNotFoundException * [cli-tool][tests] add tests for loading configs from file * [cli-tool][configs] rebased latest PR * [cli-tool][text analytics] added support for text analytics configs * [PR reviews] fix gitignore * [PR reviews] fix Env variables names * [PR reviews] change methods to be async * [PR reviews] refactor populating headers in httpHandler * [PR reviews] add message to NotSupportedException * [PR reviews] add timeout for CustomTextPrediction * [PR reviews] remove properties/ from gitignore * [cli-tool][pipelines] fix azure-pipelines.yml to run tests correctly * [cli tool][text analytics] added text analytics command in view layer * [cli tool][text analytics] added text analytics commands, controller, service * [cli-tool][text analytics] text analytics command working * [cli-tool][text analytics] Use config keys for json property names * [cli-tool] make config load command async * [cli-tool][tests] add integration tests for TextAnalyticsPredictionService * [cli tool][text analytics] modified response object * [cli tool][text analytics] handled exceeding prediction limits * [cli-tool][chunker] Add charLimit check and fix pageCount bug * [cli-tool][chunker] refactor chunking logic to smaller functions * [cli-tool][text analytics] Handle text analytics exceptions * [cli tool][project restructure] created the new sub projects * [cli tool][project restructure][view layer] refactor namespaces, class names * [cli tool][project restructure][view layer] added configs folder and classes * [cli tool][project restructure][definitions layer] update namespace * [cli tool][project restructure][definitions layer] update models namespaces * [cli tool][project restructure][definitions layer] update enums namespaces * [cli tool][project restructure][definitions layer] update exceptions namespaces * [cli tool][project restructure][definitions layer] update APIS namespaces * [cli tool][project restructure][core layer] update controllers namespaces * [cli tool][project restructure][core layer] update factories namespaces * [cli tool][project restructure][core layer] update services namespaces * [cli tool][project restructure][core layer] update helpers namespaces * [cli tool][project restructure][tests layer] update namespaces * [cli tool][project restructure] projecting building * [cli tool][project restructure] bug fix - commands * [cli-tool][formatting] reoder using and remove unused * [cli-tool] update commands in readme * [cli-tool][project restructure] predict command handles both text analytics and custom text * [cli-tool][restructure] update paths in Azure Pipeline * [cli-tool][restructure] get changes from cli-tool-text-analytics branch * [cli-tool] merge changes made on load-configs branch * [batch testing] add batch testing code from LUIS * [batch testing] refactor code * [batch testing] add tests project * [batch testing] add input validation and test * [batch testing] rename TestingExample fields * [cli-tool][batch testing] Add evaluate command and batch testing controller * [batch testing] fix bug in intentsStats initialization * [cli-tool][batch testing] fix bug in BatchTestingController when reading json file * [cli-tool][batch testing] Add unit test for BatchTestingService * [cli-tool][batch testing] enable Evaluation Service test * [batch testing] add model evaluation nuget * add LuisModelEvaluation Solution to azure pipeline * [cli-tool] remove utilties commands * [cli tool][evaluation][configs] added evaluation config model * [cli tool][evaluation][configs] rename prediction model * [cli tool][evaluation][configs] added tests for new config model * [cli tool][batch testing][read labeled data] added logic to read test data from custom text application * [cli tool][batch testing][read labeled data] modified some stuff * [cli tool][batch testing] add exceptions to CustomTextAuthoringService * [cli tool][chunker] improve paragraph detection heurisitcs * [cli tool][chunker] improve paragraph detection heurisitcs * [cli tool][batch testing] handle customText examples correctly * [nuget] handle division by zero * [nuget] add support for multi-class classification (cherry picked from commit a0ff64c35f481b1d0ccf32c8211a864a89a2fd15) * [cli tool] add support for multi class * [cli tool][chunker] use end of line space instead of line length * [cli tool][batch testing] use empty list instead of none classification * [nuget] check only null and not empty lists * [cli tool][chunker] add new line after paragraph * [cli tool][chunker] use median vertical space for comparison instead of previous * [nuget] Add test for wrong prediction scenario * [nuget] Add child entities to test * [cli tool][chunker] use difference between top and top for vertical spacing * [batch test nuget][PR reviews] fix intent and entity initializaton * [batch test nuget][PR reviews] refactored intent and utterances names to class and query to be more generic * [cli tool][batch testing] Add mapping for nuget output * [batch test nuget][PR reviews] modified input data structure to be HashSet in "AggregateClassificationStats()" in EvaluationService * [cli tool][batch testing] Add Models initialization from Custom Text App * [batch test nuget][PR reviews] refactored entity stats lookup * [cli tool][batch testing] fix hierarchical model names when initializing list of entities from application * Add command descriptions * [batch test nuget][PR reviews] updated model type for unknown models * [nuget] remove coverlet.collector dependency from test project * [nuget] replace select with foreach in EvaluationService * [nuget] change IEnumerable to IReadOnlyList * [nuget] Add hierarchy separator to constants * [nuget] code formatting * [nuget] remove unused variable * [nuget] add documentation and code clean up * Merge branch 'dev' of https://github.com/microsoft/Cognitive-Custom_text_Utilities into batch-testing-read-from-labeled-data # Conflicts: # CLI Tool/CogSLanguageUtilities.Core/Configs/DependencyInjectionController.cs # CLI Tool/CogSLanguageUtilities.Definitions/APIs/Configs/IConfigsLoader.cs * [cli tool][chunker] fix bug when appending to last page * [cli tool] [batch testing] change IEnumerable to List to be compatible with nuget * [cli tool][batch testing] custom text examples pagination * remove extra file * [batch testing] update TA SDK version * [cli tool][batch testing][PR reviews] organized input and output model mappers for nuget * [cli tool][batch testing] evaluation command -> store prediction output in standalone directory to avoid confusion * [cli tool][batch testing][PR reviews] extracted api services base url (for custom text) * [cli tool][PR reviews] use function overloading instead of default value in LoggerService * [cli tool][PR reviews] using named parameters for explicit values and removing extra enums * [cli tool][PR reviews] change Task.Run to Task.FromResult Co-authored-by: Mohamed Shaban <a-moshab@microsoft.com> Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> Co-authored-by: Mohamed Abbas <moaba@microsoft.com> * Chunker redesign (#31) * [cli tool][chunker] separate parsing logic from chunker * [cli tool][parser] Add unit test for mapping in MsReadParser * [cli tool][chunker redesign][PR reviews] addressed some PR reviews * [cli tool][chunker redesign][PR reviews] addressed some PR reviews * [cli tool][chunker redesign][PR reviews] added comments for msreadparser paragraph detection heuristics we used * [cli tool][chunker redesign][PR reviews] some reorgnization * [cli tool][PR reviews] remove unecessary Task.FromResult Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> Co-authored-by: Mohamed Shaban <a-moshab@microsoft.com> * Add readme files for CLI Tool and Evaluation Nuget (#34) * [cli tool] Update CLI Tool readme file * Add readme files for each command * Add predict command readme * Add evaluation command readme * Add How to use section and configs example to readme * Add chunk command readme * Add evaluation nuget readme Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * [cli tool][chunker] separate parsing logic from chunker * [cli tool][parser] Add unit test for mapping in MsReadParser * [cli tool][chunker redesign][PR reviews] addressed some PR reviews * [cli tool][chunker redesign][PR reviews] added comments for msreadparser paragraph detection heuristics we used * [cli tool][chunker redesign][PR reviews] some reorgnization * [cli tool][doc tree] added doc tree models * [cli tool][document tree] msread parser -> updated parse output model * [cli tool][document tree] chunker -> updated no chunk method * [cli tool][document tree] chunker -> updated chunk by char method * [cli tool][document tree] chunker -> updated chunk by page method * [cli tool][document tree] chunker -> updated plain text parser * [cli tool][chunker document tree] fix tests * [cli tool][chunker document tree] refactor document tree to conatin one root element instead of a list * [cli tool] move evaluation code inside CLI tool solution (#39) * [cli tool] move evaluation code inside CLI tool solution * remove building nuget from pipeline * Update azure-pipelines.yml for Azure Pipelines Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * Add MIT license header comments to all files (#40) Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * Add cli version 1.0.0-preview (#41) Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * rename cli tool to clutils and add to main readme (#43) Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * change version 0.0.1 (#44) Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * [clutils][doc tree] renamed source directory * [cli tool][parser] added docx parser * [cli tool][docx parser] added support for parsing tables * [cli tool][docx parser] added support for parsing bulletpoints * [cli tool][docx parser] removed parser option from commands (parse and predict) * [cli tool][docx parser] added support for heading2, heading3 * [cli tool][docx parser] updated docx parser to use document tree * [cli tool][docx parser] fixing some bugs * [cli tool][docx parser] refactor elementType Enum * [cli tool][docx parser] bug fixes * Chunker redesign doc tree (#36) * [cli tool][chunker] separate parsing logic from chunker * [cli tool][parser] Add unit test for mapping in MsReadParser * [cli tool][chunker redesign][PR reviews] addressed some PR reviews * [cli tool][chunker redesign][PR reviews] added comments for msreadparser paragraph detection heuristics we used * [cli tool][chunker redesign][PR reviews] some reorgnization * [cli tool][doc tree] added doc tree models * [cli tool][document tree] msread parser -> updated parse output model * [cli tool][document tree] chunker -> updated no chunk method * [cli tool][document tree] chunker -> updated chunk by char method * [cli tool][document tree] chunker -> updated chunk by page method * [cli tool][document tree] chunker -> updated plain text parser * [cli tool][chunker document tree] fix tests * [cli tool][chunker document tree] refactor document tree to conatin one root element instead of a list * [clutils][doc tree] renamed source directory * Add unit test for PlainTextParser Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> Co-authored-by: Mohamed Shaban <a-moshab@microsoft.com> * Add unit test for DocxParserService * [azure pipelines] rename project in yml * [clutils][docx parser] addressed PR comments * update computer vision sdk to version 6.0.0 (#49) Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * Chunker logical sections configs (#37) * [cli tool][parser] added docx parser * [cli tool][docx parser] added support for parsing tables * [cli tool][docx parser] added support for parsing bulletpoints * [cli tool][docx parser] added support for heading2, heading3 * [cli tool][docx parser] updated docx parser to use document tree * [cli tool][docx parser] fixing some bugs * [cli tool][docx parser] refactor elementType Enum * [cli tool][docx parser] bug fixes * [cli tool][logical section chunking] added logical section configs * [cli tool][chunker] remove ChunkSectionLevel Enum * [clutils][chunk by section] remove unwanted files * fix configs example.json Co-authored-by: Mohamed Shaban <a-moshab@microsoft.com> Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * Chunker logical section (#38) * [cli tool][parser] added docx parser * [cli tool][docx parser] added support for parsing tables * [cli tool][docx parser] added support for parsing bulletpoints * [cli tool][docx parser] added support for heading2, heading3 * [cli tool][docx parser] updated docx parser to use document tree * [cli tool][docx parser] fixing some bugs * [cli tool][docx parser] refactor elementType Enum * [cli tool][docx parser] bug fixes * [cli tool][logical section chunking] added logical section configs * [cli tool][chunker] remove ChunkSectionLevel Enum * [cli tool][parser] added docx parser * [cli tool][docx parser] added support for parsing tables * [cli tool][docx parser] added support for parsing bulletpoints * [cli tool][docx parser] added support for heading2, heading3 * [cli tool][docx parser] updated docx parser to use document tree * [cli tool][docx parser] fixing some bugs * [cli tool][docx parser] refactor elementType Enum * [cli tool][docx parser] bug fixes * [cli tool][docx parser] added support for parsing bulletpoints * [cli tool][logical section chunking] added logical section configs * [cli tool][chunker] remove ChunkSectionLevel Enum * [cli tool][chunker] Add chunking by logical section * [clutils][refactor] remove unwanted tracked files * [clutils][chunk by section] rebase bug fix * fix parser tests * [clutils][logucal sections] addressed PR comments Co-authored-by: Mohamed Shaban <a-moshab@microsoft.com> Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * update readme and version (#50) Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> * fix link to evaluate and chunk commands in README * restructure project * cherry-picked fuzzy matching (#61) * cognitive search integration with custom text (#60) integration with cognitive search * iap custom tool (#62) IAP custom tools * Fuzzy matching - Fixed Issue with division in cosine similarity (#104) * Fixing minor Cosine similarity issue. (Abs/dotProduct) --> (dotProduct/abs) * Added Tests for Scores E2E Did minor refactoring to some of the testing code * Fuzzy matching: return multiple matches (#105) return multiple matches * restructure (#111) * add docs for fuzzy matching * Fuzzy matching approach1 token matching (#121) implement approach 1 of fuzzy matching * Update fuzzy matching interfaces and classes (#124) fuzzy matching interface restructuring * refactor MatchResult class * update MatchResult naming (#135) * fuzzy matching with entity resolution approach 2 (#137) * adding postprocessing for scores (#140) * preliminarily resolves #145 (#146) * move Azure function secrets to indexing tool configs file (#126) * [cognitive search integration] update custom text apis to v3.2-preview (#162) * Update CustomTextAnalytics Rest Client to PuP APIs (#161) * Update CustomTextAnalytics Rest Client to PuP APIs * Update tests * fixing mini sdk tests * fix azure function * fix indexing tool * update azure function headers * publish indexer tool Co-authored-by: Kareem Yousef <kayousef@microsoft.com> * Hot fix (#163) * fix azure function * fix indexing cli tool -> failed to create skillset error * fix errors in Azure function * publish indexer * updates and fixes to both indexer and azure function (#164) * update readme * add entity names validation * re-publish indexer * fix azure function publishing * fix azure function publishing * update indexer tool configs file -> azure function url * re-publish indexer * Update configs.json * update readme * update readme links (#166) * Update configs.json * Update app-schema.json Co-authored-by: Mohamed Shaban <a-moshab@microsoft.com> Co-authored-by: NourEldin Yasser <a-noyass@microsoft.com> Co-authored-by: NourEldin Yasser <noursalem95@gmail.com> Co-authored-by: NourEldin Yasser <65115481+a-noyass@users.noreply.github.com> Co-authored-by: Mohamed Abbas <moaba@microsoft.com> Co-authored-by: MIDDLEEAST\v-moshaban <v-moshaban@microsoft.com> Co-authored-by: Ahmed Leithy <v-aleithy@microsoft.com> Co-authored-by: Salah Mostafa <zulamostafa@gmail.com> Co-authored-by: Kareem Yousef <kayousef@microsoft.com> |
||
---|---|---|
.. | ||
Solution | ||
README.md |
README.md
CogSLanguagueUtils
The Cognitive Services Language Utilities is a CLI tool that provides some core functionalities to make the experience of using some of the Cognitive Language Services simpler (mainly Custom Text).
Installation
You can download the executable here If you would like to use the tool system wide, you can add it to your PATH environment variable. Run the tool to check your installation.
$ clutils
Available Commands
The following commands are currently available:
Overview
The tool currently supports three main use cases
Document Parsing
- In order to make the app training experience easier, we added the "parse" command. The reason being is that Text Analytics and Custom Text only accept plain text input.
- The "parse" command allows you to extract plain text from your documents of any supported extension.
- It also allows you to chunk your document into smaller segments using different methods. This is to be compliant with the services' character limit as well as allow you to be in control of how your document is broken down to smaller parts.
- The tool also uploads the converted text files to your blob storage which is a pre-requisite for training your Custom Text Application.
Prediction
- In order to make the prediction pipline easier, we added the 'predict' command. The reason being is that the user must submit the document as plain text, as well as abide by the char-limit set by the service (for example: Custom Text only accepts text documents of up to 25k tokens).
- So the "predict" command integrates this pipeline (of parsing, chunking, calling prediction apis, aggregating different chunks of the same document) into a single command.
Model Evaluation
- In order for users to test app performance, we added support for the "evaluate" command. The reason being is that Text Analytics and Custom Text currently do not provide any means for testing application performance.
- So we integrated the testing pipeline (reading labeled examples, calling prediction apis, evaluating model performance, ..) into this simple command.
Supported files
The following file formats are currently supported
- txt
- Docx
- Scanned documents and Images (jpeg, bmp, png)
Chunking Methods
The tool supports different types of chunking. All chunking methods respect paragraph endings so that chunks do not start or end mid-paragraph. If paragraph endings are not available for the document being parsed, sentences marked by '.' are used as the building block of a chunk.
- Chunk by character limit
- Chunk by page
- Chunk by section
Commands Overview
-
config
- load from file
- clutils config load --path <ABSOLUTE_PATH>
- load from file
-
parsing
- clutils parse --source <BLOB/LOCAL> --destination <BLOB/LOCAL> [ --chunk-type <PAGE/CHAR> ]
-
chunking
- clutils chunk --source <BLOB/LOCAL> --destination <BLOB/LOCAL>
-
prediction
- clutils predict --cognitive-service <customtext/textanalytics/both> --source <BLOB/LOCAL> --destination <BLOB/LOCAL> [ --chunk-type <PAGE/CHAR> ]
-
evaluation
- clutils evaluate --cognitive-service <customtext/textanalytics/both> --source <BLOB/LOCAL> --destination <BLOB/LOCAL>