зеркало из https://github.com/mozilla/marian.git
176 строки
6.4 KiB
Markdown
176 строки
6.4 KiB
Markdown
|
|
# Marian
|
|
[![Join the chat at https://gitter.im/amunmt/amunmt](https://badges.gitter.im/amunmt/amunmt.svg)](https://gitter.im/amunmt/amunmt?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
|
|
|
|
[![CUDABuild Status](http://vali.inf.ed.ac.uk/jenkins/buildStatus/icon?job=amunmt_compilation_cuda)](http://vali.inf.ed.ac.uk/jenkins/job/amunmt_compilation_cuda/)
|
|
[![CPU Build Status](http://vali.inf.ed.ac.uk/jenkins/buildStatus/icon?job=amunmt_compilation_cpu)](http://vali.inf.ed.ac.uk/jenkins/job/amunmt_compilation_cpu/)
|
|
|
|
<p>
|
|
<b>Marian</b> (formerly known as AmuNMT) is an efficient Neural Machine Translation framework written
|
|
in pure C++ with minimal dependencies. It has mainly been developed at the
|
|
Adam Mickiewicz University in Poznań (AMU) and at the University of Edinburgh.
|
|
</p>
|
|
|
|
<p>
|
|
It is currently being deployed in
|
|
multiple European projects and is the main translation and training engine
|
|
behind the neural MT launch at the
|
|
<a href="http://www.wipo.int/pressroom/en/articles/2016/article_0014.html">World Intellectual Property Organization</a>.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
Main features:
|
|
<ul>
|
|
<li> Fast multi-gpu training and translation </li>
|
|
<li> Compatible with Nematus and DL4MT </li>
|
|
<li> Efficient pure C++ implementation </li>
|
|
<li> Permissive open source license (MIT) </li>
|
|
<li> <a href="http://amunmt.github.io/features"> more details... </a> </li>
|
|
</ul>
|
|
</p>
|
|
|
|
If you use this, please cite:
|
|
|
|
Marcin Junczys-Dowmunt, Tomasz Dwojak, Hieu Hoang (2016). Is Neural Machine
|
|
Translation Ready for Deployment? A Case Study on 30 Translation Directions
|
|
(https://arxiv.org/abs/1610.01108)
|
|
|
|
@InProceedings{junczys2016neural,
|
|
title = {Is Neural Machine Translation Ready for Deployment? A Case Study
|
|
on 30 Translation Directions},
|
|
author = {Junczys-Dowmunt, Marcin and Dwojak, Tomasz and Hoang, Hieu},
|
|
booktitle = {Proceedings of the 9th International Workshop on Spoken Language
|
|
Translation (IWSLT)},
|
|
year = {2016},
|
|
address = {Seattle, WA},
|
|
url = {http://workshop2016.iwslt.org/downloads/IWSLT_2016_paper_4.pdf}
|
|
}
|
|
|
|
## Website:
|
|
|
|
More information on https://marian-nmt.github.io
|
|
|
|
## Recommended software
|
|
|
|
### GPU version
|
|
|
|
**Ubuntu 16.04 LTS (tested and recommended).** For Ubuntu 16.04 the standard
|
|
packages should work. On newer versions of Ubuntu, e.g. 16.10, there may be
|
|
problems due to incompatibilities of the default g++ compiler and CUDA.
|
|
|
|
* CMake 3.5.1 (default)
|
|
* GCC/G++ 5.4 (default)
|
|
* Boost 1.58 (default)
|
|
* CUDA 8.0
|
|
|
|
**Ubuntu 14.04 LTS (tested).** A newer CMake version than the default version is
|
|
required and can be installed from source.
|
|
|
|
* CMake 3.5.1 (due to CUDA related bugs in earlier versions)
|
|
* GCC/G++ 4.9
|
|
* Boost 1.54
|
|
* CUDA 7.5
|
|
|
|
### CPU version
|
|
|
|
The CPU-only version will automatically be compiled if CUDA cannot be detected by CMake.
|
|
Only the translator will be compiled, the training framework is strictily GPU-based.
|
|
|
|
Tested on different machines and distributions:
|
|
|
|
* CMake 3.5.1
|
|
* The CPU version should be a lot more forgiving concerning GCC/G++ or Boost versions.
|
|
|
|
#### macOS
|
|
|
|
To be able to make the CPU version on macOS, first install [brew](https://brew.sh/) and then run:
|
|
|
|
brew install cmake boost boost-python
|
|
|
|
Then, proceed to the next section.
|
|
|
|
## Download and Compilation
|
|
|
|
Clone a fresh copy from github:
|
|
|
|
git clone https://github.com/amunmt/amunmt
|
|
|
|
The project is a standard CMake out-of-source build:
|
|
|
|
cd amunmt
|
|
mkdir build
|
|
cd build
|
|
cmake ..
|
|
make -j
|
|
|
|
If run for the first time, this will also download Marian -- the training
|
|
framework for Marian.
|
|
|
|
### Compile Python bindings
|
|
|
|
In order to compile the Python library, after running _make_ as in the previous section, do:
|
|
|
|
make python
|
|
|
|
This will generate a _libamunmt.dylib_ or _libamunmt.so_ in your `build/src/` directory, which can be imported from Python.
|
|
|
|
## Running Marian
|
|
|
|
### Training
|
|
|
|
Assuming `corpus.en` and `corpus.ro` are
|
|
corresponding and preprocessed files of a English-Romanian parallel corpus, the
|
|
following command will create a Nematus-compatible neural machine translation model.
|
|
|
|
./marian/build/marian \
|
|
--train-sets corpus.en corpus.ro \
|
|
--vocabs vocab.en vocab.ro \
|
|
--model model.npz
|
|
|
|
See the [documentation](https://marian-nmt.github.io/docs/#marian) for a full list
|
|
of command line options or the
|
|
[examples](https://marian-nmt.github.io/examples/training) for a full example of
|
|
how to train a WMT-grade model.
|
|
|
|
### Translating
|
|
|
|
If a trained model is available, run:
|
|
|
|
./marian/build/amun -m model.npz -s vocab.en -t vocab.ro <<< "This is a test ."
|
|
|
|
See the [documentation](https://marian-nmt.github.io/docs/#amun) for a full list of
|
|
command line options or the
|
|
[examples](https://marian-nmt.github.io/examples/translating) for a full example of
|
|
how to use Edinburgh's WMT models for translation.
|
|
|
|
## Example usage
|
|
|
|
* **[Translating with Amun](https://marian-nmt.github.io/examples/translating/)**:
|
|
The files and scripts described in this section can be found in
|
|
`amunmt/examples/translate`. They demonstrate how to translate with Amun using
|
|
Edinburgh's German-English WMT2016 single model and ensemble.
|
|
* **[Training with Marian](https://marian-nmt.github.io/examples/training/)**: The files
|
|
and scripts described in this section can be found in
|
|
`marian/examples/training`. They have been adapted from the
|
|
Romanian-English sample from <https://github.com/rsennrich/wmt16-scripts>.
|
|
We also add the back-translated data from <http://data.statmt.org/rsennrich/wmt16_backtranslations/>
|
|
as desribed in [Edinburgh's WMT16 paper](http://www.aclweb.org/anthology/W16-2323).
|
|
The resulting system should be competitive or even slightly better than
|
|
reported in that paper.
|
|
* **[Winning system of the WMT 2016 APE shared task](https://marian-nmt.github.io/examples/postedit/)**:
|
|
This page provides data and model files for our shared task winning APE system
|
|
described in [Log-linear Combinations of Monolingual and Bilingual Neural
|
|
Machine Translation Models for Automatic
|
|
Post-Editing](http://www.aclweb.org/anthology/W16-2378).
|
|
|
|
## Acknowledgements
|
|
|
|
The development of Marian received funding from the European Union's Horizon 2020
|
|
Research and Innovation Programme under grant agreements 688139
|
|
(<a href="http://www.summa-project.eu">SUMMA</a>; 2016-2019) and 645487
|
|
(<a href="http://www.modernmt.eu">Modern MT</a>; 2015-2017), the
|
|
Amazon Academic Research Awards program, and the World Intellectual
|
|
Property Organization.
|
|
|