An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/pdf/2106.04718.pdf
Перейти к файлу
Fei Hu c5549069cb
Enable fastseq generate CLI to handle the empty shard correctly (#82)
* Handle the empty shard

* Remove unused variable
2021-01-15 20:46:46 -08:00
benchmarks Benchmark refinement (#74) 2020-12-26 17:04:28 -08:00
docker Update wmt benchmark. (#65) 2020-12-03 14:30:15 -08:00
examples Benchmark refinement (#74) 2020-12-26 17:04:28 -08:00
fastseq Enable fastseq generate CLI to handle the empty shard correctly (#82) 2021-01-15 20:46:46 -08:00
fastseq_cli Benchmark refinement (#74) 2020-12-26 17:04:28 -08:00
tests Add missing init files (#62) 2020-11-21 10:35:00 -08:00
.gitignore Update wmt benchmark. (#65) 2020-12-03 14:30:15 -08:00
.pylintrc Enhance pylintrc (#38) 2020-09-14 10:29:01 -07:00
CODE_OF_CONDUCT.md Initial CODE_OF_CONDUCT.md commit 2020-07-15 10:53:35 -07:00
LICENSE Initial LICENSE commit 2020-07-15 10:53:37 -07:00
README.md Benchmark refinement (#74) 2020-12-26 17:04:28 -08:00
SECURITY.md Initial SECURITY.md commit 2020-07-15 10:53:34 -07:00
azure-pipelines.yml Update install_requires and enable fairseq to work with torch 1.6&1.7 (#59) 2020-11-20 22:14:25 -08:00
requirements.txt Add READMEs for models. (#53) 2020-11-10 11:26:00 -08:00
setup.py Update install_requires and enable fairseq to work with torch 1.6&1.7 (#59) 2020-11-20 22:14:25 -08:00

README.md

FastSeq

Introduction

FastSeq provides efficient implementation of popular sequence models (e.g. Bart, ProphetNet) for text generation, summarization, translation tasks etc. It automatically optimizes inference speed based on pupular NLP toolkits (e.g. FairSeq and HuggingFace-Transformers) without accuracy loss. All these can be easily done (no need to change any code/model/data if using our command line tool, or simply add one-line code import fastseq if using source code).

Speed Gain

Below shows the generation speed gain by using FastSeq.

Model W/O FastSeq (in samples/s) W/ FastSeq (in samples/s) Speedup
ProphetNet 2.8 10.7 3.8x
Bart (fs) 2.4 19.7 8.2x
Bart (hf) 2.5 12.4 5.0x
DistilBart (hf) 3.4 18.5 5.4x
T5 (hf) 8.7 31.3 3.6x
WMT16 En-De (fs) 96.0 417.0 4.3x
  • All benchmarking experiments run on NVIDIA-V100-16GB with docker. Highest speed recorded for each model by tuning batch size. For parameter setting details, click link of corresponding model.
  • fs stands for Fairseq 0.9.0 version, hf stands for Huggingface Transformers 3.0.2 version.
  • Optimizations were automatically applied to all generation/sequence models in Fairseq & Huggingface Transformers. Above only lists a subset of them.

How it works?

We developped a wide range of speedup techniques, including improving beam search efficiency, reducing memory footprint, speeding up calculation for key operations etc, IO speedup etc. To seamlessly connect with community, they were applied to existing models from Fairseq and Huggingface Transformers in the backend, while keeping model interface and usage same as before.

Installation

Requirements

If you use fairseq or transformers, you only need to install one of them. If you use both, you need to install both.

Install from the source

# when fairseq and/or transformers has been installed
$ pip install git+https://github.com/microsoft/fastseq.git

# install fastseq + transformers
$ pip install git+https://github.com/microsoft/fastseq.git#egg=project[transformers]

# install fastseq + fairseq
$ pip install git+https://github.com/microsoft/fastseq.git#egg=project[fairseq]

# install fastseq + transformers + fairseq
$ pip install git+https://github.com/microsoft/fastseq.git#egg=project[transformers,fairseq]

Usage

Use source code for speedup

Only one line of code change is needed to use the optimizations provided by FastSeq.

# import fastseq at the beginning of your program
import fastseq
import torch

# Download bart.large.cnn
bart = torch.hub.load('pytorch/fairseq', 'bart.large.cnn')

bart.cuda()  # use GPU
bart.eval()  # disable dropout for evaluation
bart.half()

slines = ['FastSeq provides efficient implementations of the popular sequence models. Please visit https://github.com/microsoft/fastseq for more details.']

hypotheses = bart.sample(
    slines, beam=4, lenpen=2.0, max_len_b=140, min_len=55, no_repeat_ngram_size=3)

print(hypotheses)

Use command line tool to speedup fairseq models

Example usage for bart model on cnn daily mail task.

$ fastseq-generate-for-fairseq \
    cnn_dnn/bin \
    --path bart.large.cnn/model.pt \
    --fp16 \
    --task translation \
    --batch-size 128 \
    --gen-subset valid \
    --truncate-source  \
    --bpe gpt2 \
    --beam 4 \
    --num-workers 4 \
    --min-len 55 \
    --max-len-b 140 \
    --no-repeat-ngram-size 3 \
    --lenpen 2.0

Both model file and task data file are the same as original Fairseq version.

Use command line tool to speedup transformers models

Example usage for bart model on cnn daily mail task.

$ fastseq-generate-for-transformers \
    facebook/bart-large-cnn \
    cnn_dm/val.source \
    out.summary \
    --reference_path cnn_dm/val.target \
    --device cuda \
    --bs 128 \
    --fp16 \
    --score_path out.score \
    --task summarization

Both model file and task data file are the same as original Transformers version.

Run tests

# run a single test.
$ python tests/optimizer/fairseq/test_fairseq_optimizer.py

# run all the tests.
$ python -m unittest discover -s tests/ -p '*.py'

# run all the benchmarks.
$ cd benchmarks && bash run_all_benchmarks.sh

Code Style

Python coding style

Changes to Python code should conform to PEP 8. yapf can be used to help format the python code, and use pylint to check your Python changes.

# format the code by yapf
$ yapf --style pep8 -i -r PYTHON_FILE/PACKAGE

# run pylint check
$ pylint --rcfile=.pylintrc  PYTHON_FILE/PACKAGE

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.