firefox-translations-training

Граф коммитов

Автор	SHA1	Сообщение	Дата
Evgeny Pavlov	36e56b7bdb	Fix pretraining (#485 )	2024-03-20 14:45:46 -07:00
Evgeny Pavlov	a359723e41	Fix compatibility (#480 )	2024-03-20 14:28:02 -07:00
Evgeny Pavlov	1f12387866	Fix downloading evals (#479 )	2024-03-20 14:16:10 -07:00
Evgeny Pavlov	a4d25cb760	Run tests with toolchain binaries under Docker (#478 ) * Add tests for alignments * Use toolchain in CI tests * Compile toolchain locally under Docker * Add commands to build and run docker locally * Trigger CI * Install missing libs * Clarify exporting variables for ARM processors * Add a workaround for poetry not seeing python * Clarify the reason for not installing packages in docker image	2024-03-20 12:57:42 -07:00
Valentin Rigal	fb9531f0b5	Avoid duplicated runs during W&B publication (#484 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-03-19 12:37:42 -07:00
Valentin Rigal	8646fc6b02	Avoid duplicated backward runs publication (#483 )	2024-03-19 12:32:31 -07:00
Valentin Rigal	97f7d00fbd	Support failures retrieving task artifacts (#482 )	2024-03-18 14:27:20 -07:00
Greg Tatum	89c24d892f	Add a taskcluster downloader for models (#475 ) * Drive-by: Fix the logging of parameters in the downloads * Update utility to provide a default output path, and list all download modes * Add a taskcluster downloader for models	2024-03-11 17:06:54 -05:00
Greg Tatum	65ca580a16	Add support for custom corpora through remote URLs (#420 )	2024-03-06 13:03:40 -06:00
Ben Hearsum (he/him)	a17ed9db12	Add non-spot 1tb CPU worker option (#471 )	2024-03-05 13:28:48 -05:00
Ben Hearsum (he/him)	e227bf8292	add 1tb cpu only workers (#470 )	2024-03-04 18:44:06 -05:00
Valentin Rigal	8012cd30cf	Update parser documentation (#462 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-02-29 14:57:22 -08:00
EvaBardou	62a74a9241	Allow group traversal from training tasks dependencies during Taskcluster task group publication (#461 ) * Allow group traversal from training tasks dependencies during Taskcluster task group publication * Fix lint + Apply Valentin's suggestions * Comment to clarify the usage of a shared set variable * Do not publish empty groups	2024-02-29 09:04:17 -08:00
Ben Hearsum (he/him)	ed41ca2cc1	Set TERM for docker worker images (#464 )	2024-02-27 21:30:09 -05:00
Greg Tatum	19e46e5120	Add shufflers as utilities (#467 )	2024-02-27 16:11:27 -06:00
Valentin Rigal	3706913c88	Link metrics from labels in addition to TC dependencies (#465 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-02-26 14:58:53 -08:00
Ben Hearsum (he/him)	9ae9d9ad5f	Use spot instances for PRs (#457 )	2024-02-22 19:20:01 -05:00
Valentin Rigal	3f135aa115	Taskcluster task group publication (#406 ) * Base taskcluster task group publication * Move tag parser to utils module * Support metrics * Support multiple teacher training * Fix parsing for evaluation folder * Generic group logs parser * Parse extra evaluation tasks and publish group_logs fake run * Publish Marian config on runs * Publish marian config on runs instead of experiment config * Rebase vrigal:publish-experiment-config * Publish experiment config on group_logs	2024-02-16 09:05:01 -08:00
Evgeny Pavlov	58cce071ef	Support typos and noise modifiers (#428 ) * Update opustrainer * Adjust configs * Add evaluation modifiers * Reduce noise * Add tests for typos and noise * Fix typos augmenter * Fix linting issues * Update docs * Update opustrainer * Adjust configs * Add evaluation modifiers * Reduce noise * Add tests for typos and noise * Fix typos augmenter * Fix linting issues * Update docs * Fix test * Update opus trainer * Remove noise parameters from config * Update opustrainer with fixes * Run linter * Fix tests after merge * Disable noise for student * Update lockfile * Fix formatting * Disable typos for student * Rename assert functions * Switch back to faster validation * Document decision on using augmentations * Fix typo	2024-02-15 15:33:24 -08:00
Evgeny Pavlov	092fd98deb	Fix random seed (#445 ) * Use different random seeds for the teachers * Fix substitution * Pass random seed to Marian	2024-02-15 12:49:33 -08:00
Evgeny Pavlov	e8ac81d9b8	Work aroud fast text model downloading failures (#435 ) * Decrease max run time * Add retry * Remove todo * Fix indentation	2024-02-15 10:37:08 -08:00
Evgeny Pavlov	190358a923	Fix linting for tracking (#441 ) * Fix linting pythonpath for tracking * Add pythonpath to the rest of the commands * Remove pythonpath * Update lockfile * Fix wandb directory in tests	2024-02-15 09:21:33 -08:00
Ben Hearsum (he/him)	34c4e01bd6	Enable 'train' action for PRs against a supported repository (#447 ) * Enable 'train' action for PRs against a supported repository * Fix scope repo url for actions	2024-02-15 11:13:22 -05:00
Ben Hearsum (he/him)	e6ec0d5474	Add support for triggering actions from PR decision tasks. (#442 )	2024-02-14 19:09:54 -05:00
Ben Hearsum (he/him)	70fede467f	Add the ability to run starting from a specific task (fixes #227 ) (#377 ) * Add the ability to run starting from a specific task (fixes #227) A couple of example runs with this: * https://firefox-ci-tc.services.mozilla.com/tasks/groups/YHAr0HzwSSe4pe5Yh9dIlg uses https://firefox-ci-tc.services.mozilla.com/tasks/groups/JjNp3KcyTUObUtOA9BgK5g as its `previous-group-id` with `start-stage: train-backwards` and `target-stage: train-teacher` - and ends up running `train-backwards, `translate-mono-trg`, `collect-mono-trg`, and `train-teacher`. * https://firefox-ci-tc.services.mozilla.com/tasks/groups/Sm0YV_8LQP-EOE8Nz6G5Lw uses the above group as its `previous-group-id` with `start-stage: train-teacher` and `target-stage: all`. Note that it ended up depending on tasks from both the above group and the one that it was based on, and ended up scheduling `train-teacher` and everything after it (I didn't bother letting them all run - I think the scheduling is enough to verify this). Big thanks to @gabrielBusta for suggesting this implementation! * Update poetry dependencies to pull in newer taskgraph version	2024-02-14 09:07:07 -05:00
Ben Hearsum (he/him)	4a5fc1f8c7	fix: set --output-file correctly in taskgraph-diff (#413 ) At the moment we end up with files that sit beside the output directory rather than in it. In CI, this means that we don't get the diffs uploaded as artifacts.	2024-02-13 13:19:51 -05:00
Greg Tatum	ffa6d77902	Add a structured logging script (#437 ) * Add a structured logger to bicleaner * Adjust the pythonpath for the download_pack.py script	2024-02-12 16:08:02 -06:00
Valentin Rigal	9ebfd13903	Publish YAML configuration to group_logs run (#386 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-02-12 09:16:36 -08:00
Greg Tatum	90d3279714	Check for the file existence rather than status code (#433 )	2024-02-12 11:05:28 -06:00
Greg Tatum	e0e580b116	Add newscrawl tests, and update some testing infrastructure (#434 ) * Use zst in the data importer test * Add a test for news crawl importer * Add a tree printing method * Drive-by: Output stderr in run_task * Temporarily disable newscrawl test * Merge the two tests	2024-02-12 11:04:43 -06:00
Evgeny Pavlov	31311927ef	Move snakemake to a separate folder (#431 ) * Move snakemake code to a separate folder * Small fixes * Run linter * Revert formatting * Fix readme	2024-02-09 09:46:52 -08:00
Evgeny Pavlov	afad4f4cad	Tune sentencepiece alphas (#421 ) * Increase sp alpha and move to configs * Add docs * Update docs/training-guide.md Co-authored-by: Greg Tatum <gregtatum@users.noreply.github.com> * Update docs/training-guide.md Co-authored-by: Greg Tatum <gregtatum@users.noreply.github.com> * Update docs/training-guide.md Co-authored-by: Greg Tatum <gregtatum@users.noreply.github.com> --------- Co-authored-by: Greg Tatum <gregtatum@users.noreply.github.com>	2024-02-06 12:23:18 -08:00
Ben Hearsum (he/him)	74a5a1751c	fix: adjust dependencies, fetches, and cache digests when using the `use` training continuation mode (#418 ) Dependencies and fetches are unused, so they should be removed. Cache digests should _only_ be influenced by the pretrained teacher parameters (file resources and other parameters are unused - so they do not influence the outcome of the task). Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-02-05 15:45:59 -08:00
Greg Tatum	4a33069423	Rewrite eval.sh to python and add a json output for evaluation metrics (#401 ) * Convert the evaluation script to python * Remove the old eval scripts * Simplify evaluate-teacher-ensemble * Simplify evaluate-quantized * Simplify taskcluster/kinds/evaluate/kind.yml * Add assertions for the json evaluation metric * Use the sacrebleu python api	2024-02-05 15:17:58 -06:00
Evgeny Pavlov	d14881ea71	Download taskcluster live logs for training (#416 ) * Download taskcluster live logs for training * Refactor code	2024-02-01 15:28:45 -08:00
Greg Tatum	3d81ca5d11	Add wget to mock out downloads for importers (#410 ) * Use run_task for dataset importing * Add a wget mock for downloads	2024-01-30 16:22:13 -06:00
Greg Tatum	cb4231e650	Use a run_task abstraction for eval-tests.py (#393 ) * Change ca to ru in the test assertions * Use a test defined config file This fixes an issue where if you modify the tc.prod.yml it will break the taskcluster tests. It will also allow for tests to share the same config between runs and not have a dirty artifacts folder. * Add a run_task test utility that excercises the full taskgraph * Parameterize the eval test	2024-01-30 09:52:32 -06:00
Amit Moryossef	225b30df9a	fix(snakefile): correctly locate files to translate (#402 )	2024-01-29 11:41:35 -06:00
Ben Hearsum (he/him)	4db5fd73c0	Make cleaning_type in find_upstreams only required when the cleaning-type attribute is set (#385 )	2024-01-29 12:28:46 -05:00
Ben Hearsum (he/him)	437ceac078	Add documentation on how to monitor CPU, GPU, etc. on training instances (#398 )	2024-01-29 11:35:22 -05:00
Amit Moryossef	6df9a73d61	fix(snakemake): remove unused packs (#404 ) * fix(snakemake): remove unused packs * Update Snakefile	2024-01-29 10:30:15 -06:00
Greg Tatum	f4ded7d07f	Add huggingface to the find_corpus (#397 )	2024-01-26 15:01:19 -06:00
Gabriel Bustamante	26752d0e28	Use standard compute engine instances for training (#376 ) * Disable spot instances for training * Add worker-configuration for standard VMs	2024-01-26 14:35:56 -06:00
Gabriel Bustamante	4218387361	[skip ci] add docs on `pretrained-models` configuration parameter (#349 )	2024-01-25 13:52:07 -08:00
Evgeny Pavlov	96b695d4c4	Fix Bicleaner-ai GPU usage (#392 ) * Fix bilceaner processes * Use cuda 11 for bicleaner	2024-01-25 13:15:17 -08:00
Greg Tatum	bbe7f6ec3f	Add tests for evaluation (#364 )	2024-01-25 07:36:58 -06:00
Greg Tatum	7f43bd0c7d	Point to the docs for marian args (#381 )	2024-01-25 07:31:58 -06:00
Greg Tatum	7ccb5eba7d	Add some docs to what dedupe is (#379 )	2024-01-25 07:21:10 -06:00
Evgeny Pavlov	99f2397ebf	Always use Bicleaner AI (#367 ) * Use only bicleaner ai * Remove test command * Disable hard rules for multilingual model * Change taskcluster kinds * Remove bilcleaner * Fix bicleaner model step * remove bicleaner * Fix find upstream * Add toolchain * Fix arg type * Don't delete tmp dir * Fix artefacts * Fix artifacts * Fix linter issue * Fix path * Rename pack dir * Add tests * Fix typo * Replace rename to move * Bump max run time * Remove expiration * Fix docs and clarify caching strategy * Fix doc * Revert order * Small fixes * Fix typo * Use data dir fixture * Fix comment * Remove unused item	2024-01-24 11:46:44 -08:00
Valentin Rigal	d5f4291f16	Parse missing evaluation results (#374 ) * Publish metrics at the end * Add missing steps * Publish all metrics to a group table * Support more metrics formats * Update tests * Publish runs for extra metrics * Prefix metrics with group * Improve metrics publication * Remove unused Metric.model_name * Update tests	2024-01-23 11:19:09 -08:00

1 2 3 4

171 Коммитов Все ветки Поиск

171 Коммитов

Все ветки