28de0c8c2d
* Add a converter for Chinese
* Convert imported datasets to simplified
* Add augmentation modifier for cjk
* Update tests
* Move constants to the beginning of the file
* Output tokenized text from alignments step
* Detokenize text in Tags modifier
* Add CJK OpusTrainer configs
* Update taskcluster kinds to use tokenized text and cjk configs
* Test training for Chinese
* Update docs
* Reduce chunk size for alignments
* Add python path env
* Fix comment
* Change character coverage for CJK
* Use larger vocab for CJK
* Use all items from the provided vocabulary
* Revert "Change character coverage for CJK"
This reverts commit
|
||
---|---|---|
.. | ||
tasks | ||
build-mono-nllb.py | ||
config_generator.py | ||
download_hplt.py | ||
find_corpus.py | ||
marian_client.py | ||
preflight_check.py | ||
run_model.py | ||
taskcluster_downloader.py | ||
tb_log_parser.py | ||
trigger_training.py |