firefox-translations-training

История

Evgeny Pavlov 28de0c8c2d Configure vocab for CJK (#906 ) * Add a converter for Chinese * Convert imported datasets to simplified * Add augmentation modifier for cjk * Update tests * Move constants to the beginning of the file * Output tokenized text from alignments step * Detokenize text in Tags modifier * Add CJK OpusTrainer configs * Update taskcluster kinds to use tokenized text and cjk configs * Test training for Chinese * Update docs * Reduce chunk size for alignments * Add python path env * Fix comment * Change character coverage for CJK * Use larger vocab for CJK * Use all items from the provided vocabulary * Revert "Change character coverage for CJK" This reverts commit `a6c35bfe73`. * Use default sentencepiece character coverage * Relock poetry * Run linter		2024-11-06 20:45:55 -08:00
..
tasks	Add --run-as-user flag to docker-run.py (#919 )	2024-11-06 13:57:10 -06:00
build-mono-nllb.py	Remove max_words filtering from data importers (#901 )	2024-11-06 14:44:42 -08:00
config_generator.py	Configure vocab for CJK (#906 )	2024-11-06 20:45:55 -08:00
download_hplt.py	Add HPLT mono bulk importer (#645 )	2024-05-29 14:25:08 -07:00
find_corpus.py	Switch bestbleu to chrF (#908 )	2024-11-04 13:49:49 -08:00
marian_client.py	Add Marian server for model testing (#492 )	2024-03-28 15:53:16 -07:00
preflight_check.py	Rename repo (#914 )	2024-11-01 10:21:28 -05:00
run_model.py	Remove the Makefile and replace it with a Taskfile (#510 )	2024-04-09 16:11:13 -05:00
taskcluster_downloader.py	Remove the Makefile and replace it with a Taskfile (#510 )	2024-04-09 16:11:13 -05:00
tb_log_parser.py	Add ruff and black linting to the CI (#187 )	2023-09-08 09:50:24 -05:00
trigger_training.py	Rename repo (#914 )	2024-11-01 10:21:28 -05:00