ab4941de32
* limit application to only JSONL -> NER conversion * added support to configure doc language as command line parameter * modify 'location' parameter in document * cleanup * modify exception message * create release package |
||
---|---|---|
.. | ||
SourceCode | ||
FileConverter.rar | ||
readme.md |
readme.md
File Format Converter
This cli tool is meant to convert file formats from Azure ML labeling formats - such as Jsonl, Conll, and TSV - to Custom Text labeling file formats - such as custom_entities.json and custom_classifiers.json - and vice versa.
Supported File Conversions
- AML
Jsonl
entities -> CTJson
entities - AML
CoNLL
entities -> CTJson
entities
How to use
- First download this package and unzip to extract the cli tool.
- Second, open your terminal/cmd/powershell in same dir as the exe file.
(Or add that dir to your PATH env variable to run from anywhere) - Then, run the following command:
convert -sp <source_file_path> -st <source_file_type> -tp [optional]<target_file_path> -tt <target_file_type> -l [optional] <language>
Example
convert -sp "./entities.conll" -st "jsonl" -tp "./target.json" -tt "ct_entities"
File types as arguments
- AML Jsonl entities -> jsonl
- AML CONLL entities -> conll
- CT Json entities -> ct_entities
Notes
- Location Parameter
- when converting from AML-JSONL format to CT-Entities Format, our tool trims each document location
- example
- AML-JSONL document
"image_url": "AmlDatastore://textcontainer/conll_2003_ner/file1513.txt"
- CT-Entities document
"location": "file1513.txt"
- rationale: AML supports multiple storage options, what we did previously was to take the location as is from jsonl and use it. Most of these locations will not be accessible if used as is (those not in storage account), suggestion is to only take the file name and use it in the
location
field. We wanted to note this out for users to upload the files to their container.