* Add tests for alignments
* Use toolchain in CI tests
* Compile toolchain locally under Docker
* Add commands to build and run docker locally
* Trigger CI
* Install missing libs
* Clarify exporting variables for ARM processors
* Add a workaround for poetry not seeing python
* Clarify the reason for not installing packages in docker image
* Drive-by: Fix the logging of parameters in the downloads
* Update utility to provide a default output path, and list all download modes
* Add a taskcluster downloader for models
* Allow group traversal from training tasks dependencies during Taskcluster task group publication
* Fix lint + Apply Valentin's suggestions
* Comment to clarify the usage of a shared set variable
* Do not publish empty groups
* Base taskcluster task group publication
* Move tag parser to utils module
* Support metrics
* Support multiple teacher training
* Fix parsing for evaluation folder
* Generic group logs parser
* Parse extra evaluation tasks and publish group_logs fake run
* Publish Marian config on runs
* Publish marian config on runs instead of experiment config
* Rebase vrigal:publish-experiment-config
* Publish experiment config on group_logs
* Fix linting pythonpath for tracking
* Add pythonpath to the rest of the commands
* Remove pythonpath
* Update lockfile
* Fix wandb directory in tests
* Add the ability to run starting from a specific task (fixes#227)
A couple of example runs with this:
* https://firefox-ci-tc.services.mozilla.com/tasks/groups/YHAr0HzwSSe4pe5Yh9dIlg uses https://firefox-ci-tc.services.mozilla.com/tasks/groups/JjNp3KcyTUObUtOA9BgK5g as its `previous-group-id` with `start-stage: train-backwards` and `target-stage: train-teacher` - and ends up running `train-backwards, `translate-mono-trg`, `collect-mono-trg`, and `train-teacher`.
* https://firefox-ci-tc.services.mozilla.com/tasks/groups/Sm0YV_8LQP-EOE8Nz6G5Lw uses the above group as its `previous-group-id` with `start-stage: train-teacher` and `target-stage: all`. Note that it ended up depending on tasks from both the above group and the one that it was based on, and ended up scheduling `train-teacher` and everything after it (I didn't bother letting them all run - I think the scheduling is enough to verify this).
Big thanks to @gabrielBusta for suggesting this implementation!
* Update poetry dependencies to pull in newer taskgraph version
At the moment we end up with files that sit beside the output directory rather than in it. In CI, this means that we don't get the diffs uploaded as artifacts.
* Use zst in the data importer test
* Add a test for news crawl importer
* Add a tree printing method
* Drive-by: Output stderr in run_task
* Temporarily disable newscrawl test
* Merge the two tests
Dependencies and fetches are unused, so they should be removed.
Cache digests should _only_ be influenced by the pretrained teacher parameters (file resources and other parameters are unused - so they do not influence the outcome of the task).
Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>
* Convert the evaluation script to python
* Remove the old eval scripts
* Simplify evaluate-teacher-ensemble
* Simplify evaluate-quantized
* Simplify taskcluster/kinds/evaluate/kind.yml
* Add assertions for the json evaluation metric
* Use the sacrebleu python api
* Change ca to ru in the test assertions
* Use a test defined config file
This fixes an issue where if you modify the tc.prod.yml it will break
the taskcluster tests. It will also allow for tests to share the same
config between runs and not have a dirty artifacts folder.
* Add a run_task test utility that excercises the full taskgraph
* Parameterize the eval test
* Publish metrics at the end
* Add missing steps
* Publish all metrics to a group table
* Support more metrics formats
* Update tests
* Publish runs for extra metrics
* Prefix metrics with group
* Improve metrics publication
* Remove unused Metric.model_name
* Update tests