b5cdec614e | ||
---|---|---|
data | ||
eval | ||
install | ||
translators | ||
.gitignore | ||
Dockerfile | ||
LICENSE | ||
README.md | ||
requirements.txt | ||
start_docker.sh |
README.md
Firefox Translations Evaluation
The code was moved to https://github.com/mozilla/firefox-translations-models/tree/main/evals for easier debugging.
Calculates BLEU and COMET scores for Firefox Translations models using bergamot-translator and compares them to other translation systems.
Running
We recommend running this on a Linux machine with at least one GPU, and inside a docker container. If you intend to run it on macOS, run the eval/evaluate.py
script standalone inside a virtualenv, and skip the Start docker
section below. You might need to manually install the correspondent packages in the Dockerfile
in your system and virtual environment.
Clone repo
git clone https://github.com/mozilla/firefox-translations-evaluation.git
cd firefox-translations-evaluation
Download models
Use install/download-models.sh
to get Firefox Translations models (be sure to have git-lfs enabled) or use your own ones.
Install NVIDIA Container Toolkit
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
Start docker
Recommended memory size for Docker is 16gb.
export MODELS=<absolute path to a local directory with models>
# Specify Azure key and location if you want to add Azure Translator API for comparison
export AZURE_TRANSLATOR_KEY=<Azure translator resource API key>
# optional, specify if it's different than default 'global'
export AZURE_LOCATION=<location>
# Specify GCP credentials json path if you want to add Google Translator API for comparison
export GCP_CREDS_PATH=<absolute path to .json>
# Build and run docker container
bash start_docker.sh
On completion, your terminal should be attached to the launched container.
Run evaluation
From inside docker container run:
python3 eval/evaluate.py --translators=bergamot,microsoft,google --pairs=all --skip-existing --gpus=1 --evaluation-engine=comet,bleu --models-dir=/models/models/prod --results-dir=/models/evaluation/prod
If you don't have a GPU, use 0
in the --gpus
argument.
More options:
python3 eval/evaluate.py --help
Details
Installation scripts
install/install-bergamot-translator.sh
- clones and compiles bergamot-translator and marian (launched in docker image).
install/download-models.sh
- downloads current Mozilla production models.
Docker & CUDA
The COMET evaluation framework supports CUDA, and you can enable it by setting the --gpus
argument in the eval\evaluate.py
script to the number of GPUs you wish to utilize (0
disables it).
If you are using it, make sure you have the nvidia container toolkit enabled in your docker setup.
Translators
- bergamot - uses compiled bergamot-translator in wasm mode
- marian - uses compiled marian
- google - users Google Translation API
- microsoft - users Azure Cognitive Services Translator API
Reuse already calculated scores
Use --skip-existing
option to reuse already calculated scores saved as results/xx-xx/*.bleu
files.
It is useful to continue evaluation if it was interrupted
or to rebuild a full report reevaluating only selected translators.
Datasets
SacreBLEU - all available datasets for a language pair are used for evaluation.
Flores - parallel evaluation dataset for 101 languages.
Language pairs
With option --pairs=all
, language pairs will be discovered
in the specified models folder (option --models-dir
)
and evaluation will run for all of them.
Results
Results will be written to the specified directory (option --results-dir
).
Evaluation results for models that are used in Firefox Translation can be found in firefox-translations-models/evaluation