firefox-translations-training

Граф коммитов

Автор	SHA1	Сообщение	Дата
Valentin Rigal	d1d1efc441	Multiply comet score by 100 in online mode (#868 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-10-01 11:27:11 -07:00
Greg Tatum	9d355d82fe	Rewrite train.sh to train.py (#842 ) * Add a run_pipeline utility * Add more tests for training * Rewrite train.sh into train.py * Add the pipeline to the PYTHONPATH * Ensure that the W&B tracker throws errors in CI * Add the Taskcluster environment variables so test-fast works on the train test * Address review comments	2024-09-18 09:04:48 -05:00
Valentin Rigal	241f16831e	Suffix W&B runs with task group ID for offline Taskcluster publication from GCP (#799 ) * Use task group ID as suffix for offline Taskcluster publication from GCP * Fix group_logs publication * Add a mode to support GCP experiments from Taskcluster in a generic way * Fix metrics path for GCP experiments that ran on Taskcluster * Ignore old snakemake metrics that cannot be parsed * Update tests * Do not parse metrics name for new GCP experiments (taskcluster) * Add tests for metrics filename parser * Add a parser for GCP metrics filename support * Support Taskcluster metrics structure in WandB.publish_group_logs * Patch model name in group_logs * Patch model suffix in group_logs * Add details to value error exceptions * Do continue on unsupported filename (Snakemake) * Preserve legacy metrics dir for snakemake experiments * Rework the GCP file structure browsing * Fixes * Include quantized metrics * Update tests	2024-09-13 13:05:04 -07:00
Valentin Rigal	70eff55b82	Publish Marian/OpusTrainer configuration YAMLs and dataset statistics (#720 ) * Publish Marian, OpusTrainer configs and datasets statistics * Update tests * Fixes * Fix tests in CI context * Nit * Store extra config files as new keys of the main config * Plot datasets in a custom chart * Fixes * Support extra-args for offline publication * Fix tests * Suggestions * TRASHME Test publication from CI * Revert "TRASHME Test publication from CI" This reverts commit `2da4a9a3cd`. * Suggestion * Trigger CI * Fix training and model key detection * TRASHME: Trigger publication from CI * Revert "TRASHME: Trigger publication from CI" This reverts commit `ad4a3b7368`. --------- Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com> Co-authored-by: Evgeny Pavlov <pavlov89@gmail.com>	2024-09-03 11:39:10 -07:00
Valentin Rigal	f7247a60a0	Pass W&B suffix to publish_group_logs (offline experiments) (#818 ) * Pass W&B suffix to publish_group_logs (offline experiments) * Nit --------- Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-08-30 16:50:46 -07:00
Valentin Rigal	676cb4fa45	Add group ID suffix to group_logs metrics published from online evaluation tasks (#820 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-08-30 14:51:27 -07:00
Valentin Rigal	9db196cc30	Publish experiments to W&B from the CI (#817 ) * Publish experiments from the CI * Disable cache for CI runs * Revert "Disable cache for CI runs" This reverts commit ca4593a39846a1a5cddf5ebf41a02fc698e23bea.	2024-08-29 09:13:43 -07:00
Valentin Rigal	d5b94fe422	Group logs online evals (#708 ) * Upgrade group_logs metrics from online evaluation tasks * Support incrementing group_logs metrics table * Use real run name * Remove useless indent & fixes * Nits * Support disabled publication through WANDB_PUBLICATION * Fix linting --------- Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-08-08 11:19:55 -07:00
Bastien Abadie	28aa45d437	Multiply comet metric by 100 before publication (#754 )	2024-07-24 11:28:12 -07:00
Valentin Rigal	2027f4e99b	Use unique run names in Weight & Biases (#727 ) * Use unique names for W&B runs * Update tests * Fix online group_logs publication * Use task group ID suffix in offline publication from Taskcluster * Use task group ID suffix in offline publication from GCP old Snakemake experiments * Fix * TRASHME Test publication from CI * Revert "TRASHME Test publication from CI" This reverts commit `4e15ed4eb4`.	2024-07-12 11:46:26 -07:00
Valentin Rigal	794bdb2240	Rebase on main @5d35e4a3 (#696 )	2024-07-02 12:04:49 -07:00
Bastien Abadie	5d35e4a30c	Expose Takscluster task owner as author for Weight & Biases publication (#704 ) * Expose Takscluster task owner as author for Weight & Biases publication * Publish author tag	2024-07-01 10:07:26 -07:00
Valentin Rigal	5ccc2c396c	Update the table published on group_logs (#660 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-06-28 16:03:16 -07:00
Evgeny Pavlov	61a2704711	Fix poetry lock (#706 ) * Revert "Use pip-compile for tracking dependencies (#695)" This reverts commit `24748d0608`. * Fix numpy issue	2024-06-27 10:34:19 -07:00
Bastien Abadie	24748d0608	Use pip-compile for tracking dependencies (#695 )	2024-06-26 16:38:23 -07:00
Valentin Rigal	50d5507202	Fix offline group publication (#638 ) * Skip unrelated tasks in taskcluster group publication * Fix typo --------- Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-05-30 17:01:29 -07:00
Valentin Rigal	b20c6247c0	Parse stalled validation data (#637 ) * Add missing validation metrics * Allow validation entries missing stalled value * Update tests * Support learning rate * Update test fixtures for W&B * Suggestion --------- Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-05-30 10:45:50 -07:00
Valentin Rigal	6745aba3f2	Support unexisting project in group_logs publication (#644 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-05-29 15:50:35 -07:00
Valentin Rigal	c6c10a521f	Avoid publishing experiment config when W&B publication is disabled (#627 ) * Avoid publishing experiment config when W&B publication is disabled * TRASHME Reproduce issue with train-backwards * Revert "TRASHME Reproduce issue with train-backwards" This reverts commit `ad2b7a73e5`.	2024-05-23 13:00:14 -04:00
Valentin Rigal	c2a6e7f8c8	Publish experiment config from taskcluster training task (group_logs) (#602 ) * Configure evaluation tasks * Extract w&b code into module * Do not check taskcluwter when publication is disabled * Publish evaluation metrics to W&B * Fix running eval tracking on CI * Use args.wandb_run_name instead of default teacher * Remove duplicated arguments * Retrieve dataset from Taskcluster directly * Add missing calls to publisher and logging * Allow publishing metrics as a table on existing runs (i.e. previous trainings) * Update regex to parse labels ending with '-1' * Generic support for train/eval different naming * Update tests * Support disabled publication * Publish group_logs from taskcluster * Update tests * Refactor group_log publication between online and offline taskcluster * Restore missing input-file argument * Rebase and fixes * TRASHME test parameters to trigger train in CI * Fix metrics_tasks default value * Fix import * Run linter * Publish config first * Revert "TRASHME test parameters to trigger train in CI" This reverts commit `ede4245786`. --------- Co-authored-by: Bastien Abadie <bastien@nextcairn.com> Co-authored-by: Bastien Abadie <abadie@teklia.com> Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-05-22 17:00:29 -07:00
Valentin Rigal	419ff93c6d	Publish comet metrics (#621 ) * Configure evaluation tasks * Extract w&b code into module * Do not check taskcluwter when publication is disabled * Publish evaluation metrics to W&B * Fix running eval tracking on CI * Use args.wandb_run_name instead of default teacher * Remove duplicated arguments * Retrieve dataset from Taskcluster directly * Add missing calls to publisher and logging * Allow publishing metrics as a table on existing runs (i.e. previous trainings) * Update regex to parse labels ending with '-1' * Generic support for train/eval different naming * Update tests * Support disabled publication * Support COMET metric in online publication * Enable publication * Run linter * Revert "Enable publication" This reverts commit `a1ef893173`. --------- Co-authored-by: Bastien Abadie <bastien@nextcairn.com> Co-authored-by: Bastien Abadie <abadie@teklia.com> Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com> Co-authored-by: Evgeny Pavlov <pavlov89@gmail.com>	2024-05-22 14:07:28 -07:00
Valentin Rigal	8a1d8ef2c3	Publish evaluation metrics (#598 ) * Configure evaluation tasks * Extract w&b code into module * Do not check taskcluwter when publication is disabled * Publish evaluation metrics to W&B * Fix running eval tracking on CI * Use args.wandb_run_name instead of default teacher * Remove duplicated arguments * Retrieve dataset from Taskcluster directly * Add missing calls to publisher and logging * Allow publishing metrics as a table on existing runs (i.e. previous trainings) * Update regex to parse labels ending with '-1' * Generic support for train/eval different naming * Update tests * Support disabled publication --------- Co-authored-by: Bastien Abadie <bastien@nextcairn.com> Co-authored-by: Bastien Abadie <abadie@teklia.com> Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-05-22 11:35:29 -07:00
Greg Tatum	f2d796e8d1	Ensure all of the task labels can be parsed in the task graph (#612 ) * Show the stack trace when there is an error in tracking * Rename parse_tag to parse_task_label * Document the regexes * Turn the parsed label into a NamedTuple for better typing hints * Get the tag parsing working with the full task graph * Allow for 3 letter language codes * Temporarily disable an evaluate task that is failing * Update code docs a bit * Fix tests for finetune-student	2024-05-20 13:03:11 -05:00
Valentin Rigal	f66f1844f7	Parse tasks with label finetune-student (#609 ) * Add test case * Update regex	2024-05-17 10:53:21 -07:00
Valentin Rigal	c3ad3a4837	Generic Taskcluster task naming (#589 ) * Add test * Update name detection from tasks * Update other tests * Support renaming for old experiments and quantized run for consistency * Update tests * Suggestions * Fixes	2024-05-16 09:49:36 -07:00
Valentin Rigal	ea95bc0cb1	Override W&B data on a resumed training (#595 ) * Override W&B data on a resumed training * Suggestions	2024-05-15 11:35:54 -07:00
Evgeny Pavlov	adb890e14c	Fix W&B publication setting (#585 ) * Add W&B publication setting * Switch to boolean * Add wandb setting to the train action * Run linter * Change type name * Make bool values lower case	2024-05-14 11:09:26 -07:00
Evgeny Pavlov	77e95bfcd4	Fix config parsing (#583 ) * Fix config parsing * Trigger CI * Fix condition	2024-05-13 13:23:37 -07:00
Valentin Rigal	c4b0d12198	Parse evaluation data from .metrics artifacts in taskcluster (#565 ) * Support parsing metrics directly from .metrics artifacts * Update tests * Rebase --------- Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-05-07 11:54:18 -07:00
Bastien Abadie	459cf20e68	Replace print by sys.stdout.buffer.write (#560 )	2024-05-07 11:37:12 -07:00
Bastien Abadie	f6451baf46	Add missing wandb_publication parameter on finetune-student task. (#555 ) * Add missing wandb_publication parameter on finetune-student task. * Apply on env * Disable publication by default	2024-05-03 11:29:08 -07:00
Bastien Abadie	658d6c2d88	Taskcluster publication (#501 ) * Configure Taskcluster secret for w&b through taskgraph transform * Direct secret usage * Support recursive join on tags * Do not use taskcluster filtering with --from-stream * Enable verbose mode * More debugging lines * Redirect opustrainer stderr to stdout * Log marian lines in verbose output * Fix test * Log raw lines & add prefix to our own logging * Correct project name * Convert Taskcluster trigger from transform to to kind * Fix lint * Setup tracking on all training tasks * Set WandB group & run name * Skip publication on unit tests * Make perplexity optional * Update test fixture * Add training parameter to control publication * Move trigger control to python * Use task config to get wandb names * Bashism * Use taskcluster group logic to build task name * Run on WANDB_PUBLICATION=false, but do not publish * Expose weight & biases tags	2024-04-29 10:32:23 -07:00
Valentin Rigal	48176f6a90	Prevent duplicating group_logs from experiments entrypoint (#511 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-04-03 12:01:07 -07:00
Valentin Rigal	f49e9c35be	Parse logs with Marian 1.12 (#512 ) * Add test data * Update parser * Add tests	2024-04-03 10:35:37 -07:00
Valentin Rigal	55ab1fc486	Add argument to override runs (#498 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-03-28 11:51:06 -07:00
Valentin Rigal	418a1e4d55	Consistent dataset names (#494 ) * Consistent evaluation tags parsing * Add test * Support backwards training task label * Support evaluation task with suffix * Support suffixes with form -1/2 --------- Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-03-28 11:46:14 -07:00
Valentin Rigal	8c7bff6f00	Skip checking existing runs for new projects (#496 ) * Skip checking existing runs for new projects Follow up of #484 Otherwise a value Error was raised by the wandb client * Suggestion --------- Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-03-26 11:34:47 -07:00
EvaBardou	5e99eb34f9	Allow to use taskcluster.Secrets service to retrieve W&B secret API Key (#493 ) * Allow to use taskcluster.Secrets service to retrieve W&B secret API Key * Properly parse TC secret	2024-03-26 10:30:42 -07:00
Valentin Rigal	fb9531f0b5	Avoid duplicated runs during W&B publication (#484 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-03-19 12:37:42 -07:00
Valentin Rigal	8646fc6b02	Avoid duplicated backward runs publication (#483 )	2024-03-19 12:32:31 -07:00
Valentin Rigal	97f7d00fbd	Support failures retrieving task artifacts (#482 )	2024-03-18 14:27:20 -07:00
EvaBardou	62a74a9241	Allow group traversal from training tasks dependencies during Taskcluster task group publication (#461 ) * Allow group traversal from training tasks dependencies during Taskcluster task group publication * Fix lint + Apply Valentin's suggestions * Comment to clarify the usage of a shared set variable * Do not publish empty groups	2024-02-29 09:04:17 -08:00
Valentin Rigal	3706913c88	Link metrics from labels in addition to TC dependencies (#465 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-02-26 14:58:53 -08:00
Valentin Rigal	3f135aa115	Taskcluster task group publication (#406 ) * Base taskcluster task group publication * Move tag parser to utils module * Support metrics * Support multiple teacher training * Fix parsing for evaluation folder * Generic group logs parser * Parse extra evaluation tasks and publish group_logs fake run * Publish Marian config on runs * Publish marian config on runs instead of experiment config * Rebase vrigal:publish-experiment-config * Publish experiment config on group_logs	2024-02-16 09:05:01 -08:00
Valentin Rigal	9ebfd13903	Publish YAML configuration to group_logs run (#386 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-02-12 09:16:36 -08:00
Valentin Rigal	d5f4291f16	Parse missing evaluation results (#374 ) * Publish metrics at the end * Add missing steps * Publish all metrics to a group table * Support more metrics formats * Update tests * Publish runs for extra metrics * Prefix metrics with group * Improve metrics publication * Remove unused Metric.model_name * Update tests	2024-01-23 11:19:09 -08:00
Valentin Rigal	d35f28e542	Add publication package (#309 ) * Add documentation * Move publication parser prototype From https://github.com/mozilla/translations-experiment-tracking/pull/4 Commit a06886e0 * Update parser package for translations main repo * Remove pre-commit rules * Apply black * Update parser code * Remove package and pin requirements * Nits/Fixes * Fix taskcluster naming * Move parser to 'tracking' root folder * Switch to pyproject.toml + pinned dependencies * Add a sample for experiments structure * Update metrics parser * Add speed metrics * Only publish metrics in a bar chart * Publish fake run at last * Linting and small fixes * Merge .gitignore * Handle pushing metrics when no logs are available * Add tests * Fix tests for CI job * rename Taskcluster sample file * Suggestions * Add type hints + parser refactoring * Improve typing + run static checker (Mypy) * Suggestions * Update tests * Invert metrics data order (bleu_detok, chrf) * Update CI tests task * Fix lint * Update poetry.lock * Fix tests in CI * Fix hardcoded path * Add missing experiments/logs folder (ignored by git) * Group experiments to analyze by alphabetic order --------- Co-authored-by: Bastien Abadie <bastien@nextcairn.com> Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-01-11 13:25:53 -08:00

47 Коммитов