NMT training scripts for Bergamot
Перейти к файлу
kdavis-mozilla 39fb92aace Added Common Crawl WMT13 importer 2020-05-05 18:07:56 +02:00
importers Added Common Crawl WMT13 importer 2020-05-05 18:07:56 +02:00
tools Employ Mozilla repos 2020-04-04 07:59:24 +02:00
training-basics Added --no-same-owner to tar 2020-04-29 06:47:13 +02:00
training-basics-sentencepiece Added --no-same-owner to tar 2020-04-29 06:47:13 +02:00
transformer Added --no-same-owner to tar 2020-04-29 06:47:13 +02:00
translating-amun Add example for translating with Amun 2017-11-13 14:01:33 +00:00
wmt2017-transformer Added --no-same-owner to tar 2020-04-29 06:47:13 +02:00
wmt2017-uedin Added --no-same-owner to tar 2020-04-29 06:47:13 +02:00
.gitignore Initial commit 2017-11-11 16:44:30 +00:00
LICENSE.md Initial commit 2017-11-11 16:44:30 +00:00
README.md Update README 2018-11-27 08:51:38 +00:00

README.md

Marian examples

Examples, tutorials and use cases for the Marian toolkit.

More information on https://marian-nmt.github.io

List of examples:

  • translating-amun -- examples for translating with Amun
  • training-basics -- the complete example for training a WMT16-scale model
  • training-basics-sentencepiece -- as training-basics, but uses built-in SentencePiece for data processing, requires Marian v1.7+
  • transformer -- scripts for training the transformer model
  • wmt2017-uedin -- scripts for building a WMT2017-grade model for en-de based on Edinburgh's WMT2017 submission
  • wmt2017-transformer -- scripts for building a better than WMT2017-grade model for en-de, beating WMT2017 submission by 1.2 BLEU

Usage

First download common tools:

cd tools
make all
cd ..

Next, go to the chosen directory and run run-me.sh, e.g.:

cd training-basics
./run-me.sh

The README file in each directory provides more detailed description.

Acknowledgements

The development of Marian received funding from the European Union's Horizon 2020 Research and Innovation Programme under grant agreements 688139 (SUMMA; 2016-2019), 645487 (Modern MT; 2015-2017), 644333 (TraMOOC; 2015-2017), 644402 (HiML; 2015-2017), the Amazon Academic Research Awards program, and the World Intellectual Property Organization.

This software contains source code provided by NVIDIA Corporation.