firefox-translations-training

Граф коммитов

Автор	SHA1	Сообщение	Дата
Greg Tatum	157fc5e8ae	Update the training continuation docs (#540 )	2024-04-29 14:36:14 -05:00
Bastien Abadie	658d6c2d88	Taskcluster publication (#501 ) * Configure Taskcluster secret for w&b through taskgraph transform * Direct secret usage * Support recursive join on tags * Do not use taskcluster filtering with --from-stream * Enable verbose mode * More debugging lines * Redirect opustrainer stderr to stdout * Log marian lines in verbose output * Fix test * Log raw lines & add prefix to our own logging * Correct project name * Convert Taskcluster trigger from transform to to kind * Fix lint * Setup tracking on all training tasks * Set WandB group & run name * Skip publication on unit tests * Make perplexity optional * Update test fixture * Add training parameter to control publication * Move trigger control to python * Use task config to get wandb names * Bashism * Use taskcluster group logic to build task name * Run on WANDB_PUBLICATION=false, but do not publish * Expose weight & biases tags	2024-04-29 10:32:23 -07:00
Ben Hearsum (he/him)	327509c0ff	Revert change to generic-worker for CPU tasks (#536 ) We're hitting some odd issues with caches that need to be worked out. Eg: error: cache /builds/worker/checkouts is not empty and is missing a .cacherequires file; the cache names for this task are likely mis-configured or TASKCLUSTER_CACHES is not set properly (from https://firefox-ci-tc.services.mozilla.com/tasks/IvbeCQBuRuKIOaeOIGEfHg/runs/7)	2024-04-25 15:43:36 -04:00
Ben Hearsum (he/him)	f8fb37637e	Revert unnecessary change to docker image (#535 ) I made this in #533 to ensure I got a full test run in that PR. However, there was no need for this to make it to main. Let's back this out to avoid changing cache digests there.	2024-04-25 13:49:30 -04:00
Ben Hearsum (he/him)	d68edc08f3	Switch CPU tasks to generic-worker/d2g images (fixes #473 ) (#533 ) * Switch CPU tasks to generic-worker/d2g images (fixes #473) This switches us from the deprecated docker-worker to generic-worker. generic-worker provides a translation layer for docker-worker tasks that avoids the need to change any payloads. (It will download specified images and run payload commands in them, rather than on the host machine.) Upgrading to this new image will give us memory monitoring capabilities on the CPU workers because the new image has the GCP Ops Agent installed on it. * Invalidate docker image cache to force rebuilds of all docker tasks on d2g workers	2024-04-24 09:07:32 -04:00
Ben Hearsum (he/him)	145a84ace3	Fix parameters to use correct target tasks method (#526 ) As things are now, the `small` parameters will never generate training tasks. (The `large` params already use the correct target tasks method.)	2024-04-15 21:45:31 -04:00
Greg Tatum	e8c6f2e8d3	Remove the Makefile and replace it with a Taskfile (#510 )	2024-04-09 16:11:13 -05:00
Greg Tatum	fa56c7b298	Add parallel stats to the analyze task (#500 )	2024-04-09 15:38:16 -05:00
Greg Tatum	f60c657596	Update the docs for training continuation to use yaml (#516 )	2024-04-09 13:54:18 -05:00
Valentin Rigal	48176f6a90	Prevent duplicating group_logs from experiments entrypoint (#511 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-04-03 12:01:07 -07:00
Valentin Rigal	f49e9c35be	Parse logs with Marian 1.12 (#512 ) * Add test data * Update parser * Add tests	2024-04-03 10:35:37 -07:00
Evgeny Pavlov	a3bb87c069	Update docs for OpusTrainer and alignments (#504 ) * Update OpusTrainer docs with inline noise * Update and refactor documentation for pipeline steps	2024-03-28 18:18:23 -07:00
Evgeny Pavlov	fab87a7a70	Add support of inline noise data augmentation (#502 ) * Add eflomal based aligner * Use new aligner for shortlist * Remove old aligner * Add Taskcluster steps for whitespace tokenized alignments * Move file to a renamed directory * Use Tags modifier in training * Update tests for alignments and shortlist * Add support of inline noise augmentation in data importer * Do not use slow inline noise augmentation in devset on CI * Remove the old alignments task * Add a test for student alignments * Fix alignments in training tests * Return matplotlib module after merge * Rename functions * Add more comments in the code * Remove compression env * Relock poetry	2024-03-28 18:10:02 -07:00
Evgeny Pavlov	3774779cb7	Add Marian server for model testing (#492 ) * Compile marian server * Add Marian server for testing * Reformat * Update utils/marian_client.py Co-authored-by: Greg Tatum <gregtatum@users.noreply.github.com> * Make port configurable * Relock poetry --------- Co-authored-by: Greg Tatum <gregtatum@users.noreply.github.com>	2024-03-28 15:53:16 -07:00
Evgeny Pavlov	7a15b5e97a	Update Marian to v1.12.14 2d067afb 2024-02-16 (#491 )	2024-03-28 15:23:04 -07:00
Valentin Rigal	55ab1fc486	Add argument to override runs (#498 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-03-28 11:51:06 -07:00
Valentin Rigal	418a1e4d55	Consistent dataset names (#494 ) * Consistent evaluation tags parsing * Add test * Support backwards training task label * Support evaluation task with suffix * Support suffixes with form -1/2 --------- Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-03-28 11:46:14 -07:00
Ben Hearsum (he/him)	015a74df64	Bump taskgraph version to 7.4.0. (#497 ) This picks up an optimization that should fix #487.	2024-03-28 10:02:13 -04:00
Greg Tatum	830e5b12ac	Add a dockerignore file (#499 )	2024-03-26 14:19:50 -05:00
Valentin Rigal	8c7bff6f00	Skip checking existing runs for new projects (#496 ) * Skip checking existing runs for new projects Follow up of #484 Otherwise a value Error was raised by the wandb client * Suggestion --------- Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-03-26 11:34:47 -07:00
Greg Tatum	78977402e0	Analysis task that provides the word distribution (#477 )	2024-03-26 13:28:43 -05:00
EvaBardou	5e99eb34f9	Allow to use taskcluster.Secrets service to retrieve W&B secret API Key (#493 ) * Allow to use taskcluster.Secrets service to retrieve W&B secret API Key * Properly parse TC secret	2024-03-26 10:30:42 -07:00
Evgeny Pavlov	36ff36e534	Add tests for training (#489 ) * Install OpenBLAS to run Marian * Fix Marian's version * Support running training with run_task * Add extra args for Marian * Make train.sh compatible with CPU * Remove redundant export * Add tests for training * Fix formatting * Fetch cuda libs * Document regex * Compile marian to use on CPU for tests * Fix formatting * Fix comment * Make the file names consistent	2024-03-21 14:22:39 -07:00
Evgeny Pavlov	36e56b7bdb	Fix pretraining (#485 )	2024-03-20 14:45:46 -07:00
Evgeny Pavlov	a359723e41	Fix compatibility (#480 )	2024-03-20 14:28:02 -07:00
Evgeny Pavlov	1f12387866	Fix downloading evals (#479 )	2024-03-20 14:16:10 -07:00
Evgeny Pavlov	a4d25cb760	Run tests with toolchain binaries under Docker (#478 ) * Add tests for alignments * Use toolchain in CI tests * Compile toolchain locally under Docker * Add commands to build and run docker locally * Trigger CI * Install missing libs * Clarify exporting variables for ARM processors * Add a workaround for poetry not seeing python * Clarify the reason for not installing packages in docker image	2024-03-20 12:57:42 -07:00
Valentin Rigal	fb9531f0b5	Avoid duplicated runs during W&B publication (#484 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-03-19 12:37:42 -07:00
Valentin Rigal	8646fc6b02	Avoid duplicated backward runs publication (#483 )	2024-03-19 12:32:31 -07:00
Valentin Rigal	97f7d00fbd	Support failures retrieving task artifacts (#482 )	2024-03-18 14:27:20 -07:00
Greg Tatum	89c24d892f	Add a taskcluster downloader for models (#475 ) * Drive-by: Fix the logging of parameters in the downloads * Update utility to provide a default output path, and list all download modes * Add a taskcluster downloader for models	2024-03-11 17:06:54 -05:00
Greg Tatum	65ca580a16	Add support for custom corpora through remote URLs (#420 )	2024-03-06 13:03:40 -06:00
Ben Hearsum (he/him)	a17ed9db12	Add non-spot 1tb CPU worker option (#471 )	2024-03-05 13:28:48 -05:00
Ben Hearsum (he/him)	e227bf8292	add 1tb cpu only workers (#470 )	2024-03-04 18:44:06 -05:00
Valentin Rigal	8012cd30cf	Update parser documentation (#462 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-02-29 14:57:22 -08:00
EvaBardou	62a74a9241	Allow group traversal from training tasks dependencies during Taskcluster task group publication (#461 ) * Allow group traversal from training tasks dependencies during Taskcluster task group publication * Fix lint + Apply Valentin's suggestions * Comment to clarify the usage of a shared set variable * Do not publish empty groups	2024-02-29 09:04:17 -08:00
Ben Hearsum (he/him)	ed41ca2cc1	Set TERM for docker worker images (#464 )	2024-02-27 21:30:09 -05:00
Greg Tatum	19e46e5120	Add shufflers as utilities (#467 )	2024-02-27 16:11:27 -06:00
Valentin Rigal	3706913c88	Link metrics from labels in addition to TC dependencies (#465 ) Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>	2024-02-26 14:58:53 -08:00
Ben Hearsum (he/him)	9ae9d9ad5f	Use spot instances for PRs (#457 )	2024-02-22 19:20:01 -05:00
Valentin Rigal	3f135aa115	Taskcluster task group publication (#406 ) * Base taskcluster task group publication * Move tag parser to utils module * Support metrics * Support multiple teacher training * Fix parsing for evaluation folder * Generic group logs parser * Parse extra evaluation tasks and publish group_logs fake run * Publish Marian config on runs * Publish marian config on runs instead of experiment config * Rebase vrigal:publish-experiment-config * Publish experiment config on group_logs	2024-02-16 09:05:01 -08:00
Evgeny Pavlov	58cce071ef	Support typos and noise modifiers (#428 ) * Update opustrainer * Adjust configs * Add evaluation modifiers * Reduce noise * Add tests for typos and noise * Fix typos augmenter * Fix linting issues * Update docs * Update opustrainer * Adjust configs * Add evaluation modifiers * Reduce noise * Add tests for typos and noise * Fix typos augmenter * Fix linting issues * Update docs * Fix test * Update opus trainer * Remove noise parameters from config * Update opustrainer with fixes * Run linter * Fix tests after merge * Disable noise for student * Update lockfile * Fix formatting * Disable typos for student * Rename assert functions * Switch back to faster validation * Document decision on using augmentations * Fix typo	2024-02-15 15:33:24 -08:00
Evgeny Pavlov	092fd98deb	Fix random seed (#445 ) * Use different random seeds for the teachers * Fix substitution * Pass random seed to Marian	2024-02-15 12:49:33 -08:00
Evgeny Pavlov	e8ac81d9b8	Work aroud fast text model downloading failures (#435 ) * Decrease max run time * Add retry * Remove todo * Fix indentation	2024-02-15 10:37:08 -08:00
Evgeny Pavlov	190358a923	Fix linting for tracking (#441 ) * Fix linting pythonpath for tracking * Add pythonpath to the rest of the commands * Remove pythonpath * Update lockfile * Fix wandb directory in tests	2024-02-15 09:21:33 -08:00
Ben Hearsum (he/him)	34c4e01bd6	Enable 'train' action for PRs against a supported repository (#447 ) * Enable 'train' action for PRs against a supported repository * Fix scope repo url for actions	2024-02-15 11:13:22 -05:00
Ben Hearsum (he/him)	e6ec0d5474	Add support for triggering actions from PR decision tasks. (#442 )	2024-02-14 19:09:54 -05:00
Ben Hearsum (he/him)	70fede467f	Add the ability to run starting from a specific task (fixes #227 ) (#377 ) * Add the ability to run starting from a specific task (fixes #227) A couple of example runs with this: * https://firefox-ci-tc.services.mozilla.com/tasks/groups/YHAr0HzwSSe4pe5Yh9dIlg uses https://firefox-ci-tc.services.mozilla.com/tasks/groups/JjNp3KcyTUObUtOA9BgK5g as its `previous-group-id` with `start-stage: train-backwards` and `target-stage: train-teacher` - and ends up running `train-backwards, `translate-mono-trg`, `collect-mono-trg`, and `train-teacher`. * https://firefox-ci-tc.services.mozilla.com/tasks/groups/Sm0YV_8LQP-EOE8Nz6G5Lw uses the above group as its `previous-group-id` with `start-stage: train-teacher` and `target-stage: all`. Note that it ended up depending on tasks from both the above group and the one that it was based on, and ended up scheduling `train-teacher` and everything after it (I didn't bother letting them all run - I think the scheduling is enough to verify this). Big thanks to @gabrielBusta for suggesting this implementation! * Update poetry dependencies to pull in newer taskgraph version	2024-02-14 09:07:07 -05:00
Ben Hearsum (he/him)	4a5fc1f8c7	fix: set --output-file correctly in taskgraph-diff (#413 ) At the moment we end up with files that sit beside the output directory rather than in it. In CI, this means that we don't get the diffs uploaded as artifacts.	2024-02-13 13:19:51 -05:00
Greg Tatum	ffa6d77902	Add a structured logging script (#437 ) * Add a structured logger to bicleaner * Adjust the pythonpath for the download_pack.py script	2024-02-12 16:08:02 -06:00

... 2 3 4 5 6 ...

344 Коммитов Все ветки Поиск

344 Коммитов

Все ветки