9c071beb27 | ||
---|---|---|
.circleci | ||
evaluation | ||
models | ||
scripts | ||
.gitattributes | ||
.gitignore | ||
LICENSE | ||
README.md | ||
registry.json |
README.md
Firefox Translations models
CPU-optimized NMT models for Firefox Translations.
The model files are hosted using Git LFS.
prod - higher quality models
dev - test models under development (can be of low quality or speed).
When a dev model has satisfactory quality, it is moved to prod.
Automatic quality evaluation
Results for prod models: BLEU, COMET
Results for dev models: BLEU, COMET
Automatic evaluation is a part of pull request CI. It uses Microsoft and Google translation APIs and pushes results back to the branch (not available for forks). It is performed using firefox-translations-evaluation tool.
Model training
Use Firefox Translations training pipeline or browsermt/students recipe to train CPU-optimized models. They should have similar size and inference speed to already submitted models.
Training data
Do not use SacreBLEU or Flores datasets as a part of training data, otherwise evaluation will not be correct.
To see SacreBLEU datasets run sacrebleu --list
.
Model contribution
All models should be contributed to dev
folder first.
By maintainers
Create a pull Request to main
branch from another branch in this repo.
From forks
Create a Pull Request to contrib
branch.
When it is reviewed and merged, another pull request to main
branch will be created by a maintainer to kick off automatic evaluation.
Local testing
You can run model evaluation locally by running bash scripts/update-results.sh
.
Make sure to set environment variables GCP_CREDS_PATH
and AZURE_TRANSLATOR_KEY
to use Google and Microsoft APIs.
If you want to run it with bergamot
only, remove mentions of those variables from bash scripts/update-results.sh
and remove microsoft,google
from scripts/eval.sh
.
Model deployment
Create a new release with a version tag x.y.z
following semantic versioning.
The models will be automatically uploaded to GCS bucket gs://bergamot-models-sandbox/x.y.z/
.
Model types
Vocabulary
Prefix of the vocabulary file in the model registry:
vocab.
- vocabulary is reused for the source and target languagessrcvocab.
andtrgvocab.
- different vocabularies for the source and target languages
GEMM precision
Suffix of the model file in the registry:
intgemm8.bin
- supportsgemm-precision: int8shiftAll
inference settingintgemm.alphas.bin
- supportsgemm-precision: int8shiftAlphaAll
inference setting
Currently supported Languages
The prod/dev labels in this repo correspond to the labels in the legacy web extension and are not related to the native integration in Firefox.
Prod
- Spanish <-> English
- Estonian <-> English
- English <-> German
- Czech <-> English
- Bulgarian <-> English
- Norwegian Bokmål -> English
- Portuguese <-> English
- Italian <-> English
- Polish <-> English
- French <-> English
Dev
- Russian <-> English
- Persian (Farsi) <-> English
- Icelandic -> English
- Norwegian Nynorsk -> English
- Ukrainian <-> English
- Dutch <-> English
- Catalan -> English
- Hungarian -> English
- Finnish -> English