This commit is contained in:
Jeff Rasley 2020-02-10 04:51:51 -08:00 коммит произвёл GitHub
Родитель b068e7018d
Коммит 4eb20eb574
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
7 изменённых файлов: 94 добавлений и 77 удалений

67
CONTRIBUTING.md Normal file
Просмотреть файл

@ -0,0 +1,67 @@
# Contributing
DeepSpeed welcomes your contributions!
## Prerequisites
DeepSpeed uses [pre-commit](https://pre-commit.com/) to ensure that formatting is
consistent across DeepSpeed. First, ensure that `pre-commit` is installed from either
installing DeepSpeed or `pip install pre-commit`. Next, the pre-commit hooks must be
installed once before commits can be made:
```bash
pre-commit install
```
Afterwards, our suite of formatting tests run automatically before each `git commit`. You
can also run these manually:
```bash
pre-commit run --all-files
```
If a formatting test fails, it will fix the modified code in place and abort
the `git commit`. After looking over the changes, you can `git add <modified files>`
and then repeat the previous `git commit` command.
## Testing
DeepSpeed tracks two types of tests: unit tests and more costly model convergence tests.
The model convergence tests train
[DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/) and measure
end-to-end convergence and related metrics. Unit tests are found in `tests/unit/` and
the model convergence tests are found in `tests/model/`.
### Unit Tests
[PyTest](https://docs.pytest.org/en/latest/) is used to execute tests. PyTest can be
installed from PyPI via `pip install pytest`. Simply invoke `pytest --forked` to run the
unit tests:
```bash
pytest --forked tests/unit/
```
You can also provide the `-v` flag to `pytest` to see additional information about the
tests. Note that [pytest-forked](https://github.com/pytest-dev/pytest-forked) and the
`--forked` flag are required to test CUDA functionality in distributed tests.
### Model Tests
To execute model tests, first [install DeepSpeed](#installation). The
[DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/) repository is cloned
as part of this process. Next, execute the model test driver:
```bash
cd tests/model/
pytest run_sanity_check.py
```
Note that the `--forked` flag is not necessary for the model tests.
## Contributor License Agreement
This project welcomes contributions and suggestions. Most contributions require you to
agree to a Contributor License Agreement (CLA) declaring that you have the right to, and
actually do, grant us the rights to use your contribution. For details, visit
https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need
to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply
follow the instructions provided by the bot. You will only need to do this once across
all repos using our CLA.
## Code of Conduct
This project has adopted the [Microsoft Open Source Code of
Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the
[Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact
[opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or
comments.

Просмотреть файл

@ -101,12 +101,11 @@ USER deepspeed
##############################################################################
# DeepSpeed
# TODO: once repo is public we can install latest deepspeed via this command
##############################################################################
#RUN git clone https://github.com/microsoft/DeepSpeed.git ${STAGE_DIR}/DeepSpeed
#RUN cd ${STAGE_DIR}/DeepSpeed && \
# git checkout . && \
# git checkout master && \
# sudo ./install.sh
#RUN rm -rf ${STAGE_DIR}/DeepSpeed
#RUN python -c "import deepspeed; print(deepspeed.__version__)"
RUN git clone https://github.com/microsoft/DeepSpeed.git ${STAGE_DIR}/DeepSpeed
RUN cd ${STAGE_DIR}/DeepSpeed && \
git checkout . && \
git checkout master && \
./install.sh
RUN rm -rf ${STAGE_DIR}/DeepSpeed
RUN python -c "import deepspeed; print(deepspeed.__version__)"

Просмотреть файл

@ -100,7 +100,7 @@ combination. ZeRO boosts the scaling capability and efficiency further.
![DeepSpeed-vs-Megatron](./docs/figures/DeepSpeed-vs-Megatron.png)
<p align="center">
<em>The figure depicts system throughput improvements of DeepSpeed (combining ZeRO-powered data parallelism with model parallelism of Nvidia Megatron-LM) over using Megatron-LM alone.</em>
<em>The figure depicts system throughput improvements of DeepSpeed (combining ZeRO-powered data parallelism with model parallelism of NVIDIA Megatron-LM) over using Megatron-LM alone.</em>
</p>
@ -119,7 +119,7 @@ convergence to desired accuracy.
-->
## Good Usability
Only a few lines of code changes are needed to enable a PyTorch model to use DeepSpeed and ZeRO. Compared to current model parallelism libraries, DeepSpeed does not require a code redesign or model refactoring. It also does not put limitations on model dimensions (such as number of attention heads, hidden sizes, and others), batch size, or any other training parameters. For models of up to six billion parameters, you can use ZeRO-powered data parallelism conveniently without requiring model parallelism, while in contrast, standard data parallelism will run out of memory for models with more than 1.3 billion parameters. In addition, DeepSpeed conveniently supports flexible combination of ZeRO-powered data parallelism with custome model parallelisms, such as tensor slicing of Nvidia Megatron-LM.
Only a few lines of code changes are needed to enable a PyTorch model to use DeepSpeed and ZeRO. Compared to current model parallelism libraries, DeepSpeed does not require a code redesign or model refactoring. It also does not put limitations on model dimensions (such as number of attention heads, hidden sizes, and others), batch size, or any other training parameters. For models of up to six billion parameters, you can use ZeRO-powered data parallelism conveniently without requiring model parallelism, while in contrast, standard data parallelism will run out of memory for models with more than 1.3 billion parameters. In addition, DeepSpeed conveniently supports flexible combination of ZeRO-powered data parallelism with custom model parallelisms, such as tensor slicing of NVIDIA's Megatron-LM.
## Features
@ -265,7 +265,7 @@ the `step` value is stored as part of the `client_sd`.
## DeepSpeed Configuration
DeepSpeed featureds can be enabled, disabled, or configured using a config JSON
DeepSpeed features can be enabled, disabled, or configured using a config JSON
file that should be specified as `args.deepspeed_config`. A sample config file
is shown below. For a full set of features see [core API
doc](https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html).
@ -377,56 +377,9 @@ as the hostname.
# Contributing
DeepSpeed welcomes your contributions!
## Prerequisites
DeepSpeed uses [pre-commit](https://pre-commit.com/) to ensure that formatting is
consistent across DeepSpeed. First, ensure that `pre-commit` is installed from either
installing DeepSpeed or `pip install pre-commit`. Next, the pre-commit hooks must be
installed once before commits can be made:
```bash
pre-commit install
```
Afterwards, our suite of formatting tests run automatically before each `git commit`. You
can also run these manually:
```bash
pre-commit run --all-files
```
If a formatting test fails, it will fix the modified code in place and abort
the `git commit`. After looking over the changes, you can `git add <modified files>`
and then repeat the previous `git commit` command.
## Testing
DeepSpeed tracks two types of tests: unit tests and more costly model convergence tests.
The model convergence tests train
[DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/) and measure
end-to-end convergence and related metrics. Unit tests are found in `tests/unit/` and
the model convergence tests are found in `tests/model/`.
### Unit Tests
[PyTest](https://docs.pytest.org/en/latest/) is used to execute tests. PyTest can be
installed from PyPI via `pip install pytest`. Simply invoke `pytest --forked` to run the
unit tests:
```bash
pytest --forked tests/unit/
```
You can also provide the `-v` flag to `pytest` to see additional information about the
tests. Note that [pytest-forked](https://github.com/pytest-dev/pytest-forked) and the
`--forked` flag are required to test CUDA functionality in distributed tests.
### Model Tests
To execute model tests, first [install DeepSpeed](#installation). The
[DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/) repository is cloned
as part of this process. Next, execute the model test driver:
```bash
cd tests/model/
pytest run_sanity_check.py
```
Note that the `--forked` flag is not necessary for the model tests.
DeepSpeed welcomes your contributions! Please see our
[contributing](CONTRIBUTING.md) guide for more details on formatting, testing,
etc.
## Contributor License Agreement
This project welcomes contributions and suggestions. Most contributions require you to
@ -445,3 +398,6 @@ Conduct](https://opensource.microsoft.com/codeofconduct/). For more information
[Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact
[opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or
comments.
## Publications
1. Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He. (2019) ZeRO: Memory Optimization Towards Training A Trillion Parameter Models. [ArXiv:1910.02054](https://arxiv.org/abs/1910.02054)

Просмотреть файл

@ -122,7 +122,7 @@ the first DeepSpeed container:
## Megatron-LM GPT2
DeepSpeed includes an example model using Megatron-LM's GPT2. Please refer to the full
[Megatron tutorial](tutorials/MegatronGPT2Tutorial.md) for more details.
[Megatron tutorial](../docs/tutorials/MegatronGPT2Tutorial.md) for more details.
* In order to fully train GPT2 with DeepSpeed and ZeRO we recommend using 8 instances of
Azure's Standard_ND40rs_v2 SKU for a total of 64 NVIDIA V100 GPUs. With this setup and
a batch size of 1536 you should be able to complete 100k training steps (153.6 million

Просмотреть файл

@ -73,9 +73,8 @@ mpu.get_data_parallel_group()
mpu.get_data_parallel_world_size()
```
### Integration with Megatron-LM
**TODO: port tutorial to its own page**
DeepSpeed is fully compatible with [Megatron](https://github.com/NVIDIA/Megatron-LM).
Please see the [Megatron-LM tutorial](docs/tutorials/MegatronGPT2Tutorial.md) for details.
Please see the [Megatron-LM tutorial](tutorials/MegatronGPT2Tutorial.md) for details.
@ -89,8 +88,8 @@ over 6 billion parameters without any model parallelism, and up to 100 billion
parameter models with model parallelism on current generation hardware.
For more details see the [ZeRO paper](https://arxiv.org/abs/1910.02054), [GPT
tutorial](../../Tutorials/Megatron_GPT2/MegatronGPT2Tutorial.md) on integration with
DeepSpeed. Additional tutorals including *BERT Tutorial*: Coming Soon.
tutorial](tutorials/MegatronGPT2Tutorial.md) on integration with
DeepSpeed. Additional tutorials including *BERT Tutorial*: Coming Soon.
<!---[BERT
tutorial](../../Tutorials/BingBertSquad/BingBertSquadTutorial.md),
-->
@ -157,7 +156,7 @@ high memory bandwidth.
**TODO: port tutorial**
DeepSpeed makes it easy to train with large batch sizes by enabling the LAMB Optimizer.
For more details on LAMB, see the [BERT
tutorial](../../Tutorials/BingBertSquad/BingBertSquadTutorial.md) and the [LAMB
tutorial](tutorials/BingBertSquadTutorial.md) and the [LAMB
paper](https://arxiv.org/pdf/1904.00962.pdf).
### Memory-Efficient Training with ZeRO Optimizer
@ -181,10 +180,10 @@ DeepSpeed supports multiple Learning Rate Schedules to enable faster convergence
large batch scaling.
### Learning Rate Range Test
Please refer to [Learning Rate Range Test](../../Tutorials/lrrt/lrrt.md).
Please refer to the [Learning Rate Range Test](tutorials/lrrt.md) tutorial.
### 1Cycle Learning Rate Schedule
Please refer to [1Cycle Learning Rate Schedule](../../Tutorials/1cycle/1Cycle.md).
Please refer to the [1Cycle Learning Rate Schedule](tutorials/1Cycle.md) tutorial.
## Simplified Data Loader

Просмотреть файл

@ -4,7 +4,7 @@ If you haven't already stepped through [DeepSpeed Model Training](../../Onboard/
In this tutorial we will be adding DeepSpeed to CIFAR-10 model, which is small image classification model.
First we will go over how to run original CIRAR-10. Then we will proceed step-by-step in enabling this model to run with DeepSpeed.
First we will go over how to run original CIFAR-10. Then we will proceed step-by-step in enabling this model to run with DeepSpeed.

Просмотреть файл

@ -1,9 +1,8 @@
# Tutorial: Megatron-LM GPT2 with DeepSpeed
**TODO: these two links are broken (not yet implemented).**
We advise you to first read through the guides for [Setup and
Onboarding](../../Onboard/onboard/onboard.md) and [Model
Training](../../Onboard/model_training/deepspeed_model_training.md).
If you haven't already, we advise you to first read through the [Getting
Started](../../README.md#getting-started) guide before stepping through this
tutorial.
In this tutorial we will be adding DeepSpeed to Megatron-LM GPT2 model, which
is a large, powerful transformer. Megatron-LM supports model-parallel and multi-node
@ -30,9 +29,6 @@ git submodule update --init --recursive
### 1.1 Training Data Setup
* Follow Megatron's [instructions](https://github.com/NVIDIA/Megatron-LM#collecting-gpt2-webtext-data)
to download the webtext data and place a symbolic link under `DeepSpeedExamples/Megatron-LM/data`:
* (*Microsoft*:) Raw and pre-processed data has already been downloaded on
all DLTS clusters: `/data/Megatron-LM/data/`. You can simply execute
`ln -s /data/Megatron-LM/data DeepSpeedExamples/Megatron-LM/`.
### 1.2 Running Unmodified Megatron-LM GPT2 model