Actualize documentation about experiment tracking on Weight & Biases (#861)

* Update existing documentation for the tracking module * Add doc for Weight & Biases * Suggestions & Nits --------- Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>
2024-10-01 20:09:26 +02:00 · 2024-10-01 20:09:26 +02:00 · c0a9585a34
--- a/docs/img/tracking/experiment_config.png
+++ b/docs/img/tracking/experiment_config.png
--- a/docs/img/tracking/group_logs_table.png
+++ b/docs/img/tracking/group_logs_table.png
--- a/docs/img/tracking/metrics.png
+++ b/docs/img/tracking/metrics.png
--- a/docs/img/tracking/run_config.png
+++ b/docs/img/tracking/run_config.png
--- a/docs/img/tracking/system_charts.png
+++ b/docs/img/tracking/system_charts.png
--- a/docs/img/tracking/training_charts.png
+++ b/docs/img/tracking/training_charts.png
--- a/docs/tracking.md
+++ b/docs/tracking.md
@ -1,43 +1,75 @@
-# Metrics publication
+# Experiment tracking

-The tracking [module](/tracking) within handles parsing training logs to extract Marian metrics in real time.
+The [tracking module](/tracking) handles parsing training logs to extract [Marian](https://marian-nmt.github.io/) training metrics in real time.

-The parser supports reading logs from a Task Cluster environment, or a local directory containing multiple training data. It can publish metrics to an external dashboard, for example [Weight & Biases](https://wandb.ai/).
+The parser supports different sources:
+* Online publication from Taskcluster training or evaluation tasks.
+* Deferred publication from a Taskcluster task or group of tasks.
+* Deferred publication from a local directory containing archived training data.

-It actually supports logs from **Marian 1.10**. Above versions (even minor) will raise a warning as not supported.
+## Parser

-## Install
+The parser supports writting metrics to [Weight & Biases](https://wandb.ai/) external storage (see the [section above](#weight--biases-dashboard)), or produce local artifacts (CSV files).

-The parser can be built as a distinct package to make developments easier using pip.
-On a virtual environment, you can install the package in editable mode (i.e from the local folder):
+It actually supports logs from **Marian 1.10** and **Marian 1.12**. Above versions (even minor) will raise a warning and may result in missing data.
+
+### Real time publication from Taskcluster
+
+Publication is implemented within the training (`pipeline.train.train.get_log_parser_command`) and evaluation (`pipeline.eval.eval.main`). This is the prefered way to track metrics, as machine resource usage will also be published to Weight & Biases.
+
+Any new experiment will automatically be published to the [public Weight & Biases dashboard](https://wandb.ai/moz-translations/projects).
+
+Any new pull request will trigger publication to the `ci` project in Weight & Biases. You may want to edit a value in `taskcluster/configs/config.ci.yml` (e.g. the first `disp-freq` entry) to force a new publication, because of Taskcluster cache.
+
+### Deffered publication from Taskcluster
+
+It is possible to use the parser on Taskcluster's tasks that have finished.
+The parser supports reading training tasks directly from the Taskcluster API (no authentication).
+
+This method is useful to reupload data of past training and evaluation tasks.
+
+You can run the parser on a Taskcluster group by running:
 ```sh
-$ pip install -e ./tracking
+$ parse_tc_group <task_group_id>
+```
+By default, this command will fetch other traversal tasks (related experiments). You can avoid this behavior by using the `--no-recursive-lookup` option.
+
+You can also run the parser based on the logs of a single task:
+```sh
+parse_tc_logs --input-file=live_backing.log
 ```

-## Behavior
+### Deffered publication from a GCP archive

-Logs are extracted from [Marian](https://marian-nmt.github.io/) training tasks, usually running in a Task Cluster environment.
+The parser supports browsing a folder structure from a GCP archive of multiple training runs.
+This method is useful to reupload data of past training and evaluation tasks that are not available anymore from Taskcluster (expired) or when handling a large amount of data.

-The parser has 3 entry points:
-* Parsing logs from a file or process in real time
-* Reading a folder with multiple training data
-* Reading a Taskcluster group (and related experiments, mentioned as "traversal")
-
-Publication is handled via the extensible module `translations_parser.publishers`.
-It actually supports writting to local CSV files or puiblish metrics to [Weight & Biases](https://docs.wandb.ai/ref/python) (W&B).
-
-### Reading a folder
-
-The parser supports reading a folder containing multiple trainings with a structure like above example:
+The structure from experiments that ran on Taskcluster should look like this:
 ```
 .
 ├── logs
-│   └── en-sv
-│       └── opusmt-multimodel-test
-│           └── opusmt-multimodel-test
-│               ├── alignments.log
-│               ├── ce_filter.log
+│   └── en-hu
+│       └── baseline_enhu_aY25-4fXTcuJNuMcWXUYtQ
+│           └── student
+│               ├── train.log
 │               └── …
+└── models
+    └── en-hu
+        └── baseline_enhu_aY25-4fXTcuJNuMcWXUYtQ
+            └── evaluation
+                ├── speed
+                │   ├── sacrebleu_wmt09.metrics
+                │   └── …
+                └── student
+                    ├── flores_devtest.metrics
+                    └── …
+```
+
+The structure from older experiments that ran with Snakemake should look like this:
+```
+.
+├── logs
+│   └── …
 └── models
    └── en-sv
        └── opusmt-multimodel-test
@ -51,22 +83,73 @@ The parser supports reading a folder containing multiple trainings with a struct
            └─ …
 ```

-
-The following rules are applied:
-* `./models` sub-folders are projects (e.g. `en-sv`), corresponding to projects in W&B.
-* Projects contains multiple groups (e.g. `opusmt-multimodel-test`), each containing multiple runs (e.g. `student-finetuned`) and usually an `evaluation` folder.
-* For each run, `train.log` is parsed (`valid.log` results are usually contained in `train.log`) and published to W&B.
-* `.metrics` files in the `evaluation` are parsed (looking for one float value per line) and also published on the same run (e.g. `[metric] tc_Tatoeba-Challenge-v2021-08-07`).
-* Once all runs of a group have been published, a last group is pushed to W&B, named `group_logs`. That run contains no metrics but all experiment files published as artifacts.
-
-### Publish from Taskcluster
-
-The parser supports reading training tasks directly from the Taskcluster API (no authentication).
-The results are published the same way as for experiments folder.
-
-You can parse a group (with other traversal tasks) by running:
+You can run the parser from a local GCP archive folder by running:
 ```sh
-$ parse_tc_group <task_group_id>
+$ parse_experiment_dir --directory gcp_archive -mode taskcluster
+```
+
+## Weight & Biases dashboard
+
+The publication is handled via the extensible module `translations_parser.publishers`.
+
+### Structure
+
+Runs on Weight & Biases are groupped by expermient. The group is suffixed by the complete Taskcluster group ID, and each of its runs is suffixed by the first 5 characters. This is required to compare runs with similar name among different groups.
+
+Examples of runs naming for Taskcluster group `dzijiL-PQ4ScKBB3oIjGQg`:
+* Training task: `teacher-1_dziji`
+* Evaluation task: `teacher-ensemble_dziji`
+* Experiment summary `group_logs_dziji` (See #group-logs)
+
+### Training data
+
+Metrics parsed in real time during the training are published in the **Charts** section of Weight & Biases.
+
+![Training charts](img/tracking/training_charts.png)
+
+Training runs have their Marian and Opustrainer configuration published to the **Overview** section in Weight & Biases:
+
+* **arguments**: Full list of arguments used to run the `marian` command.
+* **marian**: Marian runtime configuration read from logs.
+* **model**: YAML configuration file passed to Marian as `configs/model/${model_type}.yml`.
+* **opustrainer**: OpusTrainer YAML configuration read from fixed path `config.opustrainer.yml`.
+* **training**: YAML configuration file passed to Marian as `configs/model/${model_type}.yml`.
+
+The categories we came up with (model, arguments, marian, opustrainer, training), what they mean and where those configs come from.
+
+![Training config](img/tracking/run_config.png)
+
+### Evaluation metrics
+
+Metrics from evaluation tasks are published as table artifacts on Weight & Biases, with a custom chart for better comparison among runs.
+
+![Evaluation custom charts](img/tracking/metrics.png)
+
+### Group logs
+
+On every group, a last run named `group_logs` is also published. This group does not represent a training nor evaluation task, but contains the overall experiment configuration in the **Overview** link in the left menu.
+
+![Group logs config](img/tracking/experiment_config.png)
+
+This run also contain a table published as artifact, with a summary of all evaluation metrics which is visible in the **Tables** section.
+
+![Group logs table](img/tracking/group_logs_table.png)
+
+### System charts
+
+When running online from Taskcluster, the resources used by the machine will be published in a **System** section of Weight & Biases.
+
+![System charts](img/tracking/system_charts.png)
+
+## Development
+
+The parser can be built as a distinct package to make developments easier using pip.
+
+### Installation
+
+On a virtual environment, you can install the package in editable mode (i.e from the local folder):
+```sh
+$ pip install -e ./tracking
 ```

 ### Extend supported Marian metrics