Actualize documentation about experiment tracking on Weight & Biases (#861)
* Update existing documentation for the tracking module * Add doc for Weight & Biases * Suggestions & Nits --------- Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>
This commit is contained in:
Родитель
3974ccc1d1
Коммит
c0a9585a34
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 54 KiB |
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 66 KiB |
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 46 KiB |
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 41 KiB |
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 128 KiB |
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 52 KiB |
165
docs/tracking.md
165
docs/tracking.md
|
@ -1,43 +1,75 @@
|
|||
# Metrics publication
|
||||
# Experiment tracking
|
||||
|
||||
The tracking [module](/tracking) within handles parsing training logs to extract Marian metrics in real time.
|
||||
The [tracking module](/tracking) handles parsing training logs to extract [Marian](https://marian-nmt.github.io/) training metrics in real time.
|
||||
|
||||
The parser supports reading logs from a Task Cluster environment, or a local directory containing multiple training data. It can publish metrics to an external dashboard, for example [Weight & Biases](https://wandb.ai/).
|
||||
The parser supports different sources:
|
||||
* Online publication from Taskcluster training or evaluation tasks.
|
||||
* Deferred publication from a Taskcluster task or group of tasks.
|
||||
* Deferred publication from a local directory containing archived training data.
|
||||
|
||||
It actually supports logs from **Marian 1.10**. Above versions (even minor) will raise a warning as not supported.
|
||||
## Parser
|
||||
|
||||
## Install
|
||||
The parser supports writting metrics to [Weight & Biases](https://wandb.ai/) external storage (see the [section above](#weight--biases-dashboard)), or produce local artifacts (CSV files).
|
||||
|
||||
The parser can be built as a distinct package to make developments easier using pip.
|
||||
On a virtual environment, you can install the package in editable mode (i.e from the local folder):
|
||||
It actually supports logs from **Marian 1.10** and **Marian 1.12**. Above versions (even minor) will raise a warning and may result in missing data.
|
||||
|
||||
### Real time publication from Taskcluster
|
||||
|
||||
Publication is implemented within the training (`pipeline.train.train.get_log_parser_command`) and evaluation (`pipeline.eval.eval.main`). This is the prefered way to track metrics, as machine resource usage will also be published to Weight & Biases.
|
||||
|
||||
Any new experiment will automatically be published to the [public Weight & Biases dashboard](https://wandb.ai/moz-translations/projects).
|
||||
|
||||
Any new pull request will trigger publication to the `ci` project in Weight & Biases. You may want to edit a value in `taskcluster/configs/config.ci.yml` (e.g. the first `disp-freq` entry) to force a new publication, because of Taskcluster cache.
|
||||
|
||||
### Deffered publication from Taskcluster
|
||||
|
||||
It is possible to use the parser on Taskcluster's tasks that have finished.
|
||||
The parser supports reading training tasks directly from the Taskcluster API (no authentication).
|
||||
|
||||
This method is useful to reupload data of past training and evaluation tasks.
|
||||
|
||||
You can run the parser on a Taskcluster group by running:
|
||||
```sh
|
||||
$ pip install -e ./tracking
|
||||
$ parse_tc_group <task_group_id>
|
||||
```
|
||||
By default, this command will fetch other traversal tasks (related experiments). You can avoid this behavior by using the `--no-recursive-lookup` option.
|
||||
|
||||
You can also run the parser based on the logs of a single task:
|
||||
```sh
|
||||
parse_tc_logs --input-file=live_backing.log
|
||||
```
|
||||
|
||||
## Behavior
|
||||
### Deffered publication from a GCP archive
|
||||
|
||||
Logs are extracted from [Marian](https://marian-nmt.github.io/) training tasks, usually running in a Task Cluster environment.
|
||||
The parser supports browsing a folder structure from a GCP archive of multiple training runs.
|
||||
This method is useful to reupload data of past training and evaluation tasks that are not available anymore from Taskcluster (expired) or when handling a large amount of data.
|
||||
|
||||
The parser has 3 entry points:
|
||||
* Parsing logs from a file or process in real time
|
||||
* Reading a folder with multiple training data
|
||||
* Reading a Taskcluster group (and related experiments, mentioned as "traversal")
|
||||
|
||||
Publication is handled via the extensible module `translations_parser.publishers`.
|
||||
It actually supports writting to local CSV files or puiblish metrics to [Weight & Biases](https://docs.wandb.ai/ref/python) (W&B).
|
||||
|
||||
### Reading a folder
|
||||
|
||||
The parser supports reading a folder containing multiple trainings with a structure like above example:
|
||||
The structure from experiments that ran on Taskcluster should look like this:
|
||||
```
|
||||
.
|
||||
├── logs
|
||||
│ └── en-sv
|
||||
│ └── opusmt-multimodel-test
|
||||
│ └── opusmt-multimodel-test
|
||||
│ ├── alignments.log
|
||||
│ ├── ce_filter.log
|
||||
│ └── en-hu
|
||||
│ └── baseline_enhu_aY25-4fXTcuJNuMcWXUYtQ
|
||||
│ └── student
|
||||
│ ├── train.log
|
||||
│ └── …
|
||||
└── models
|
||||
└── en-hu
|
||||
└── baseline_enhu_aY25-4fXTcuJNuMcWXUYtQ
|
||||
└── evaluation
|
||||
├── speed
|
||||
│ ├── sacrebleu_wmt09.metrics
|
||||
│ └── …
|
||||
└── student
|
||||
├── flores_devtest.metrics
|
||||
└── …
|
||||
```
|
||||
|
||||
The structure from older experiments that ran with Snakemake should look like this:
|
||||
```
|
||||
.
|
||||
├── logs
|
||||
│ └── …
|
||||
└── models
|
||||
└── en-sv
|
||||
└── opusmt-multimodel-test
|
||||
|
@ -51,22 +83,73 @@ The parser supports reading a folder containing multiple trainings with a struct
|
|||
└─ …
|
||||
```
|
||||
|
||||
|
||||
The following rules are applied:
|
||||
* `./models` sub-folders are projects (e.g. `en-sv`), corresponding to projects in W&B.
|
||||
* Projects contains multiple groups (e.g. `opusmt-multimodel-test`), each containing multiple runs (e.g. `student-finetuned`) and usually an `evaluation` folder.
|
||||
* For each run, `train.log` is parsed (`valid.log` results are usually contained in `train.log`) and published to W&B.
|
||||
* `.metrics` files in the `evaluation` are parsed (looking for one float value per line) and also published on the same run (e.g. `[metric] tc_Tatoeba-Challenge-v2021-08-07`).
|
||||
* Once all runs of a group have been published, a last group is pushed to W&B, named `group_logs`. That run contains no metrics but all experiment files published as artifacts.
|
||||
|
||||
### Publish from Taskcluster
|
||||
|
||||
The parser supports reading training tasks directly from the Taskcluster API (no authentication).
|
||||
The results are published the same way as for experiments folder.
|
||||
|
||||
You can parse a group (with other traversal tasks) by running:
|
||||
You can run the parser from a local GCP archive folder by running:
|
||||
```sh
|
||||
$ parse_tc_group <task_group_id>
|
||||
$ parse_experiment_dir --directory gcp_archive -mode taskcluster
|
||||
```
|
||||
|
||||
## Weight & Biases dashboard
|
||||
|
||||
The publication is handled via the extensible module `translations_parser.publishers`.
|
||||
|
||||
### Structure
|
||||
|
||||
Runs on Weight & Biases are groupped by expermient. The group is suffixed by the complete Taskcluster group ID, and each of its runs is suffixed by the first 5 characters. This is required to compare runs with similar name among different groups.
|
||||
|
||||
Examples of runs naming for Taskcluster group `dzijiL-PQ4ScKBB3oIjGQg`:
|
||||
* Training task: `teacher-1_dziji`
|
||||
* Evaluation task: `teacher-ensemble_dziji`
|
||||
* Experiment summary `group_logs_dziji` (See #group-logs)
|
||||
|
||||
### Training data
|
||||
|
||||
Metrics parsed in real time during the training are published in the **Charts** section of Weight & Biases.
|
||||
|
||||
![Training charts](img/tracking/training_charts.png)
|
||||
|
||||
Training runs have their Marian and Opustrainer configuration published to the **Overview** section in Weight & Biases:
|
||||
|
||||
* **arguments**: Full list of arguments used to run the `marian` command.
|
||||
* **marian**: Marian runtime configuration read from logs.
|
||||
* **model**: YAML configuration file passed to Marian as `configs/model/${model_type}.yml`.
|
||||
* **opustrainer**: OpusTrainer YAML configuration read from fixed path `config.opustrainer.yml`.
|
||||
* **training**: YAML configuration file passed to Marian as `configs/model/${model_type}.yml`.
|
||||
|
||||
The categories we came up with (model, arguments, marian, opustrainer, training), what they mean and where those configs come from.
|
||||
|
||||
![Training config](img/tracking/run_config.png)
|
||||
|
||||
### Evaluation metrics
|
||||
|
||||
Metrics from evaluation tasks are published as table artifacts on Weight & Biases, with a custom chart for better comparison among runs.
|
||||
|
||||
![Evaluation custom charts](img/tracking/metrics.png)
|
||||
|
||||
### Group logs
|
||||
|
||||
On every group, a last run named `group_logs` is also published. This group does not represent a training nor evaluation task, but contains the overall experiment configuration in the **Overview** link in the left menu.
|
||||
|
||||
![Group logs config](img/tracking/experiment_config.png)
|
||||
|
||||
This run also contain a table published as artifact, with a summary of all evaluation metrics which is visible in the **Tables** section.
|
||||
|
||||
![Group logs table](img/tracking/group_logs_table.png)
|
||||
|
||||
### System charts
|
||||
|
||||
When running online from Taskcluster, the resources used by the machine will be published in a **System** section of Weight & Biases.
|
||||
|
||||
![System charts](img/tracking/system_charts.png)
|
||||
|
||||
## Development
|
||||
|
||||
The parser can be built as a distinct package to make developments easier using pip.
|
||||
|
||||
### Installation
|
||||
|
||||
On a virtual environment, you can install the package in editable mode (i.e from the local folder):
|
||||
```sh
|
||||
$ pip install -e ./tracking
|
||||
```
|
||||
|
||||
### Extend supported Marian metrics
|
||||
|
|
Загрузка…
Ссылка в новой задаче