CPU-optimized Neural Machine Translation models for Firefox Translations

Перейти к файлу

Evgeny Pavlov 5ac9ad11c5 Add Polish models (#45 ) * Add Polish models * Fake fix readme * Update evaluation results [skip ci] * Update model registry [skip ci] Co-authored-by: CircleCI evaluation job <ci-models-evaluation@firefox-translations>		2022-05-30 17:30:56 -07:00
.circleci	Update Italian -> English (#40 )	2022-03-24 17:56:14 -07:00
evaluation	Add Polish models (#45 )	2022-05-30 17:30:56 -07:00
models	Add Polish models (#45 )	2022-05-30 17:30:56 -07:00
scripts	Fix upload script	2022-04-29 09:17:08 -07:00
.gitattributes	Adding en<>es and es<>en models	2021-02-13 14:57:45 -08:00
.gitignore	Add evaluation in CI (#8 )	2021-07-26 11:29:59 -07:00
LICENSE	Add license	2021-07-15 10:43:58 -07:00
README.md	Add Polish models (#45 )	2022-05-30 17:30:56 -07:00
registry.json	Add Polish models (#45 )	2022-05-30 17:30:56 -07:00

README.md

Firefox Translations models

CPU-optimized NMT models for Firefox Translations.

The model files are hosted using Git LFS.

prod - production quality models

dev - test models under development (can be of low quality or speed).

When a dev model has satisfactory quality, it is moved to prod.

Automatic quality evaluation

Results for prod models

Resutls for dev models

Automatic evaluation is a part of pull request CI. It uses Microsoft and Google translation APIs and pushes results back to the branch (not available for forks). It is performed using firefox-translations-evaluation tool.

Model training

Use Firefox Translations training pipeline or browsermt/students recipe to train CPU-optimized models. They should have similar size and inference speed to already submitted models.

Training data

Do not use SacreBLEU or Flores datasets as a part of training data, otherwise evaluation will not be correct.

To see SacreBLEU datasets run sacrebleu --list.

Model contribution

All models should be contributed to dev folder first.

By maintainers

Create a pull Request to main branch from another branch in this repo.

From forks

Create a Pull Request to contrib branch. When it is reviewed and merged, another pull request to main branch will be created by a maintainer to kick off automatic evaluation.

Local testing

You can run model evaluation locally by running bash scripts/update-results.sh. Make sure to set environment variables GCP_CREDS_PATH and AZURE_TRANSLATOR_KEY to use Google and Microsoft APIs. If you want to run it with bergamot only, remove mentions of those variables from bash scripts/update-results.sh and remove microsoft,google from scripts/eval.sh.

Model deployment

Create a new release with a version tag x.y.z following semantic versioning.

The models will be automatically uploaded to GCS bucket gs://bergamot-models-sandbox/x.y.z/.

Currently supported Languages

Prod

Spanish <-> English
Estonian <-> English
English <-> German
Czech <-> English
Bulgarian <-> English
Norwegian Bokmål -> English
Portuguese <-> English
Italian <-> English
Polish <-> English

Dev

Russian <-> English
Persian (Farsi) <-> English
Icelandic -> English
Norwegian Nynorsk -> English

Upcoming

French <-> English