CogSLanguageUtilities/CustomTextAnalytics.FileFor...
Mo Shaban ab4941de32
AML JSONL hotfix (#205) (#206)
* limit application to only JSONL -> NER conversion

* added support to configure doc language as command line parameter

* modify 'location' parameter in document

* cleanup

* modify exception message

* create release package
2022-02-28 11:15:21 +02:00
..
SourceCode AML JSONL hotfix (#205) (#206) 2022-02-28 11:15:21 +02:00
FileConverter.rar AML JSONL hotfix (#205) (#206) 2022-02-28 11:15:21 +02:00
readme.md AML JSONL hotfix (#205) (#206) 2022-02-28 11:15:21 +02:00

readme.md

File Format Converter

This cli tool is meant to convert file formats from Azure ML labeling formats - such as Jsonl, Conll, and TSV - to Custom Text labeling file formats - such as custom_entities.json and custom_classifiers.json - and vice versa.

Supported File Conversions

  • AML Jsonl entities -> CT Json entities
  • AML CoNLL entities -> CT Json entities

How to use

  • First download this package and unzip to extract the cli tool.
  • Second, open your terminal/cmd/powershell in same dir as the exe file.
    (Or add that dir to your PATH env variable to run from anywhere)
  • Then, run the following command:
    convert -sp <source_file_path> -st <source_file_type> -tp [optional]<target_file_path> -tt <target_file_type> -l [optional] <language>

Example

    convert -sp "./entities.conll" -st "jsonl" -tp "./target.json" -tt "ct_entities"

File types as arguments

  • AML Jsonl entities -> jsonl
  • AML CONLL entities -> conll
  • CT Json entities -> ct_entities

Notes

  1. Location Parameter
    • when converting from AML-JSONL format to CT-Entities Format, our tool trims each document location
    • example
      • AML-JSONL document
      "image_url": "AmlDatastore://textcontainer/conll_2003_ner/file1513.txt"
      
      • CT-Entities document
      "location": "file1513.txt"
      
    • rationale: AML supports multiple storage options, what we did previously was to take the location as is from jsonl and use it. Most of these locations will not be accessible if used as is (those not in storage account), suggestion is to only take the file name and use it in the location field. We wanted to note this out for users to upload the files to their container.