Граф коммитов

25 Коммитов

Автор SHA1 Сообщение Дата
Evgeny Pavlov 1e9799c9bd
Add a link to W&B dashboard (#759) 2024-07-19 14:13:28 -07:00
Evgeny Pavlov 31311927ef
Move snakemake to a separate folder (#431)
* Move snakemake code to a separate folder

* Small fixes

* Run linter

* Revert formatting

* Fix readme
2024-02-09 09:46:52 -08:00
Valentin Rigal d35f28e542
Add publication package (#309)
* Add documentation

* Move publication parser prototype

From https://github.com/mozilla/translations-experiment-tracking/pull/4
Commit a06886e0

* Update parser package for translations main repo

* Remove pre-commit rules

* Apply black

* Update parser code

* Remove package and pin requirements

* Nits/Fixes

* Fix taskcluster naming

* Move parser to 'tracking' root folder

* Switch to pyproject.toml + pinned dependencies

* Add a sample for experiments structure

* Update metrics parser

* Add speed metrics

* Only publish metrics in a bar chart

* Publish fake run at last

* Linting and small fixes

* Merge .gitignore

* Handle pushing metrics when no logs are available

* Add tests

* Fix tests for CI job

* rename Taskcluster sample file

* Suggestions

* Add type hints + parser refactoring

* Improve typing + run static checker (Mypy)

* Suggestions

* Update tests

* Invert metrics data order (bleu_detok, chrf)

* Update CI tests task

* Fix lint

* Update poetry.lock

* Fix tests in CI

* Fix hardcoded path
* Add missing experiments/logs folder (ignored by git)

* Group experiments to analyze by alphabetic order

---------

Co-authored-by: Bastien Abadie <bastien@nextcairn.com>
Co-authored-by: Evgeny Pavlov <epavlov@mozilla.com>
2024-01-11 13:25:53 -08:00
Evgeny Pavlov 2df0a3a905
Update the training guide (#239)
* Update training guide

* Fix docs

* Add index file

* Remove header

* Fix docs link

* Remove tensorboard section

* Add theme

* Update navigation

* Add logo

* Use absolute links

* Fix code links

* Fix code links

* Fix link

* Clarify what config is

* Fix note for bicleaner

Co-authored-by: Marco Castelluccio <mcastelluccio@mozilla.com>

* Fix typo

Co-authored-by: Greg Tatum <gregtatum@users.noreply.github.com>

* Fix link

* Fix mentioning of Marian

Co-authored-by: Greg Tatum <gregtatum@users.noreply.github.com>

* Remove "my"

* Make note about snakemake more visible

* Fix phrasing

* Add link to bilceaner paper

* Add clarifications

* Add links to default training configs

* Add reference to bilceaner section

* Small fixes

---------

Co-authored-by: Marco Castelluccio <mcastelluccio@mozilla.com>
Co-authored-by: Greg Tatum <gregtatum@users.noreply.github.com>
2023-11-06 10:03:17 -08:00
Evgeny Pavlov 83d43bfcf6
Update docs (#224)
* Update docs

* Fix typos

* Fix TC docs

* Fix relative links
2023-10-16 16:33:29 -07:00
Evgeny Pavlov ac9ceec855
Add link to blog post 2022-07-18 16:51:15 -07:00
Evgeny Pavlov fc2b3b64f3
Add link to training guide 2022-07-18 15:31:46 -07:00
Evgeny Pavlov 7c58f6558b
Move configuraiton to profiles (#96)
* Move configuration settings to profiles

* Use realtive paths

* Fix output formatting

* Update dag

* Update docs
2022-06-17 10:56:07 -07:00
Evgeny Pavlov 03a2ddaa3f
Update README.md 2022-04-26 17:30:01 -07:00
Evgeny Pavlov 355d9b958e
Add train vocab step 2022-04-22 12:50:19 -07:00
Evgeny Pavlov 551aeb5ea0
Add more references to publications 2022-04-22 12:44:27 -07:00
Amit Moryossef b97da19bbb
Update README.md (#86) 2022-04-21 11:11:24 -07:00
Evgeny Pavlov 9fb8e9e0f8
Update README.md 2022-04-20 11:57:21 -07:00
Evgeny Pavlov 22a3751a09
Add support of Mozilla slurm cluster (#72) 2022-02-22 17:48:21 -08:00
Evgeny Pavlov 174cceaa6f
Bugfix and optimization (#41)
- bugfix
- training and decoding optimization
- evaluation refactoring
- small usability improvements
- moved marian configurations overriding back to configs
2022-01-05 13:24:05 -08:00
Evgeny Pavlov 3b3f33bf25
Quality improvements (#29) 2021-12-06 15:03:35 -08:00
Evgeny Pavlov a09b0ac7ac
Update README.md 2021-10-28 11:07:11 -07:00
Evgeny Pavlov ef8928b454
Snakemake integration (#24)
- workflow management using Snakemake
- parallelization to run on a cluster
- Singularity containerization support
- Slurm support
- teacher ensemble support
2021-10-28 10:39:09 -07:00
Evgeny Pavlov 0f6e64cf19
Minor improvements (#20)
- Flores dataset importer
- custom dataset importer
- ability to use a pre-trained backward model
- save experiment config on start
- stubs for dataset caching ( decided to sync implementation with workflow manager integration )
- use best bleu models instead of best ce-mean-words
- fix linting warnings
2021-08-17 13:20:34 -07:00
Evgeny Pavlov ec783cfbbb
Bicleaner support + fixes (#13)
SacreBLEU is a regular importer now and evaluation is not limited to sacrebleu datasets.
fixes

Added bicleaner-ai and bicleaner filtering (one or another based on available pretrained language packs).
fixes


Added script to find all datasets based on language pair and importer type, ready to use in config
fixes


Fixed conda environment activation to be reproducible on GCP

Other minor reproducibility fixes
2021-07-26 10:00:49 -07:00
Evgeny Pavlov af2abbf525
Add reference to bergamot project 2021-06-21 14:58:02 -07:00
Evgeny Pavlov 4b12dee551
Fix readme after renaming 2021-06-21 14:38:07 -07:00
Evgeny Pavlov 2bcdef2b36
Rename repo 2021-06-21 14:33:31 -07:00
Evgeny Pavlov 3bea08bf4a
Initial pipeline (#1) 2021-06-17 15:39:15 -07:00
Evgeny Pavlov 8d11fb1e97
Initial commit 2021-04-30 15:36:49 -07:00