Граф коммитов

116 Коммитов

Автор SHA1 Сообщение Дата
Jeff Rasley 8def3cb3a2
remove old unit test, should have been removed in rebase (#1097)
* remove old unit tests, should have been removed in rebase

* formatting
2021-06-08 09:34:21 -07:00
Reza Yazdani 26e3841cd4
Change the sparse attention API to be compatible with latest changes of triton (#902)
* Change the sparse attention API to be compatible with latest changes on the triton side

* remove compatibility checks for CUDA 11

* Update requirements-sparse_attn.txt

Co-authored-by: Arash Ashari <arashari@microsoft.com>
2021-06-02 12:42:53 -07:00
Reza Yazdani ed3de0c21b
Quantization + inference release (#1091)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Elton Zheng <eltonz@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
Co-authored-by: Arash Ashari <arashari@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: niumanar <60243342+niumanar@users.noreply.github.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Arash Ashari <arashari@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: niumanar <60243342+niumanar@users.noreply.github.com>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Arash Ashari <arashari@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: niumanar <60243342+niumanar@users.noreply.github.com>
2021-05-24 01:10:39 -07:00
Olatunji Ruwase 5b393f1555
Avoid unused parameters assert by default (#1039)
* Unused parameters assert should be disabled by default

* Fix message

* Invert assert logic in unit test

* Change option for ignoring unused parameters

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-05-07 11:15:09 -07:00
Sean Naren b3870363e0
[Stage][Fix] Add additional conditions when checking types of output from the model (#1026)
* Add additional conditions when checking types of output from the model

* Add test

* Modify test to use torch.tensor as well

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2021-05-01 08:46:46 -07:00
Stas Bekman de694b917f
[tests] make it easier to run tests (#923)
* make it easier to run tests

* cleanup

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-04-30 10:04:20 -07:00
Sean Naren 41ab660b5d
Refactor param_dict to config (#1008)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-04-28 17:05:03 -07:00
hamlet d0b61f1810
Add find_unused_parameters option to DeepSpeedEngine (#945)
* Add find_unused_parameters option

As unused parameters in modules may not be expected sometimes, 
add an explicit error msg when it occurred and an option to avoid the error: https://github.com/microsoft/DeepSpeed/issues/707

* Add find_unused_parameters option

As unused parameters in modules may not be expected sometimes, 
add an explicit error msg when it occurred and an option to avoid the error: https://github.com/microsoft/DeepSpeed/issues/707

* Fix syntax error

* Fix yapf error

* Fix yapf error

* Fix yapf error

* Fix yapf error

* Move stage2 find_unused_parameters to config file

* Add stage2 find_unused_parameters

* Add stage2 find_unused_parameters

* Add stage2_find_unused_parameters option

* Change error msg to reflect zero_optimization config change

* Fix yapf error

* Fix yapf errors

* Change find_unused_parameters option name

* Change find_unused_parameters option name

* Change find_unused_parameters option name

* Change find_unused_parameters option name

* Change find_unused_parameters option name

* Add UnusedParametersModel for test option find_unused_parameters

* Add unit test for stage2 find_unused_parameters

* Add cpu-adam compatible check

* Remove dups import

* Trim spaces

* Fix yapf errors

* Trim spaces

* Add False Positive test check

* Fix find_unused_parameters test

* Trim spaces

* Fix yapf error
2021-04-25 04:45:27 -07:00
Olatunji Ruwase e88ebbcfc9
Use amp autocast in ZeRO3 linear (#990)
* Use amp autocast in ZeRO3 linear

* Fix typo

* Handle specific exceptions

* CI breaks on torch.distributed

* Add autocast unit test

* Format fixes

* Fix skip logic

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-04-23 10:53:55 -07:00
Olatunji Ruwase cf5ea8912a
Add nvme unit/perf tests (#993) 2021-04-22 10:02:30 -07:00
Cheng Li 894f21daaa
Use odd shape tensor to represent parameter data in partitioned state (#981)
* use wierd shaped tensor to avoid silent failures when not registering externel params

* fix typo

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2021-04-21 14:02:39 -07:00
Conglong Li 67a48aaa89
1-bit LAMB optimizer (#970)
1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed.
Author: @conglongli, @awan-10, @samyam, Hanlin Tang, Yuxiong He
Paper: https://arxiv.org/abs/2104.06069

Co-authored-by: sdtblck <46172032+sdtblck@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-04-20 18:28:22 -07:00
Jeff Rasley 0d4a54a04d
ZeRO-Infinity (#976)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
2021-04-18 23:45:37 -07:00
Reza Yazdani e721cb691f
Supporting different hidden dimensions for transformer kernels-v2 (#934)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-04-07 17:06:41 -07:00
Stas Bekman a128f34e7d
[benchmarks] flatten/unflatten benchmarks (#919)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-04-07 13:06:28 -07:00
Jeff Rasley 8db4fdf815
disable pipe test (#915)
This test has been giving us trouble for a bit, seeing nondeterministic failures, skipping for now to not break out CI. Need to revisit soon though.
2021-04-02 13:20:21 -07:00
Conglong Li 68c8481bcf
1-bit Adam v2 (#817)
Authors: @awan-10 @conglongli @samyam @jeffra

What's new:

NCCL-based implementation which provides better performance and usability compared to the MPI-based implementation.
Add support to momentum masks for those parameters with constant zero gradients during training.
Bug fixes (e.g., #813).

* NCCL-based 1-bit Adam + Code Refactor for Comm. Backends (#594)

* NCCL based 1-bit Implementation + Refactor to add communication backends (#593)

* add nccl 1-bit optim.

* temporary commit to save stuff.

* Use dist collectives instead of mpi routines.

* remove old code for comm.

* Fix bugs. still does not work.

* modify to test the nccl side code path

* Initial gather impl. Works intra-node.

* Updates to comm. phase 2. nccl comm. passed the tests.

* refactor code to introduce nccl/mpi as backends for onebit adam.

* Refactor updates to test/engine.

* Fix compile/runtime errors.

* simplify support for nccl/mpi backends.

* Add missign file

* Add compression backend in constructor. Revert later.

* modify test with some perf counting.

* Implement a true non-blocking gather for nccl side.

* Revert "Add compression backend in constructor. Revert later."

This reverts commit df8c40d310.

* improve the 1-bit adam test.

* Refactor comm. and compression backend in 1-bit adam.

* Fix the test.

* Fix runtime errors and typos in nccl backend

* fix mpi backend. modify tests.

* modify nccl perf test.

* fix mpi side errors.

* Add an mpi perf test

* Sync DSE.

* Remove old collectives file.

* Undo a typo.

* Graceful failure for torch versions that don't support nccl pt2pt.

* Revert "Merge branch 'master' into staging-1bit-nccl-v2"

This reverts commit 7840085070, reversing
changes made to a6dba72aea.

* Revert "Revert "Merge branch 'master' into staging-1bit-nccl-v2""

This reverts commit 6dbdd9858b.

* comm optimization + 1-bit lamb

* Saving/debugging commit.

* finalizing 1-bit lamb

* finalizing 1-bit lamb

* add momentum mask and chkpt handling for 1-bit adam

* Cleanup and modify nccl test to be runnable with deepspeed launcher.

* Fix format.

* fix formatting again.

* make test runnable without mpi4py

* Add dist.alltoall and dist.allgather instead of custom functions.

* remove debug prints.

* formatting and renaming

* renaming

* renaming

* add unit test, fix existing tests

* skip unit test when torch < 1.8

* revert 1-bit lamb

* flatten momentum when dimension is more than 1

* add warning message for 1-bit adam under fp32

* improve version check

* add fp32 test

* 1-bit adam doc

* fix file name

* doc fix

* torch 1.8 is released

* doc fix

* fix tests

* update news

* add doc for momentum mask

* fix checkpoing handling, add unit test

* checkpoint handling doc

* doc final cleanup

* bump dates

* update tests

* url change

* doc fix

* fix test

* doc update

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-03-16 16:27:20 -07:00
Olatunji Ruwase fa87a73a8a
Fix ZeRO3 save_checkpoint (#857)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-03-16 13:06:39 -07:00
Jeff Rasley 871f3048ad
Allow args to be optional in deepspeed.initialize (#825) 2021-03-16 12:38:08 -07:00
Samyam Rajbhandari 4601885972
Samyamr/inference hook fix (#851)
* Fix mis-aligned-grad

When a parameter is not divisible by world size, the partitioned gradients are mis-aligned due to incorrect padding handling. This PR should fix for that.

* Formatting fix

* Adding static_scale test back for Z3, and also changing hidden size to be not divisile by world_size

* also removing alignment from flat fp16 buffers

* Testing for hidden dim alignment

* inference hook fix

* Update stage3.py

* formatting

* [bug-fix] move params to gpu if offload params is turned off

Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-03-15 13:07:20 -07:00
Jeff Rasley dd03cff29f
set adamw_mode default true (follows FusedAdam and < 0.3.11 logic) (#844) 2021-03-10 18:02:08 -08:00
Samyam Rajbhandari 599258f979
ZeRO 3 Offload (#834)
* Squash stage3 v1 (#146)

Co-authored-by: Samyam <samyamr@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>

* Fix correctness bug (#147)

* formatting fix (#150)

* stage3 bugfix (API) update and simplified FP16 Z3 tests (#151)

* fp16 Z3 API update and bugfix

* revert debug change

* ZeRO-3 detach and race condition bugfixes (#149)

* trying out ZeRO-3 race condition fix

* CUDA sync instead of stream

* reduction stream sync

* remove commented code

* Fix optimizer state_dict KeyError (#148)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152)

* Simplifying the logic for getting averaged gradients (#153)

* skip for now

* Z3 Docs redux (#154)

* removing some TODOs and commented code (#155)

* New Z3 defaults (#156)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* formatting

* megatron external params

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
2021-03-08 12:54:54 -08:00
Olatunji Ruwase ec8b1cb0a0
Activation checkpointing for non-tensor arguments and return values (#741)
* Activation checkpoint support for non tensor input/output

* Format fixes

* Address PR comments; Add ordering edge case tests
2021-02-12 14:01:52 -08:00
Cheng Li e2dfe0d17b
Add flops profiler tutorial (#682)
* work on flops profiler tutorial

* update flops profiler tutorial

* add flops profiler tutorial and fix names

* work on flops profiler tutorial

* update flops profiler tutorial

* add flops profiler tutorial and fix names

* fix tailing ws

* fix names

* remove multistep profiling and update docs

* fix cases where functionals and submodules coexist in a parent module, update readme

* fix typo

* always invoke post hook function

* fix module flops sum and update tests

* update tutorial
2021-02-10 18:03:55 -08:00
Jeff Rasley 2e2dd861f3
Dist testing backend fixes, etc. (#708) 2021-01-29 13:08:37 -08:00
Jeff Rasley 91b1b7f33e
[transformer-kernel] turn off unit test printing (#701) 2021-01-27 12:45:29 -08:00
Shaden Smith e59ba12d4d
make test_pipe more stable (#683) 2021-01-20 12:29:41 -08:00
Olatunji Ruwase 865104be85
Support optimizer AdamW type (#670) 2021-01-15 05:25:29 -08:00
Jeff Rasley f032e56f8a
Validate consistent ckpt tags across ranks (#667) 2021-01-14 13:38:46 -08:00
Cheng Li e2fbe4d238
squash latest flops profiling changes (#1) (#664)
Co-authored-by: Cheng Li <pistasable@gmail.com>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-01-12 17:06:52 -08:00
Shaden Smith adcfd2694d
Handle actvitation checkpointing args that are None or non-tensors (#660)
Special thanks to @g-karthik for tracking this issue down.
2021-01-12 10:18:34 -08:00
Olatunji Ruwase da5563a9c1
LR scheduler unit tests (#429)
* Add Linear warmup+decay lr schedule
Update lr schedule unit tests

* LR scheduler unit tests for LR Range Test and 1Cycle

* Disable yapf to preserve parameterizaton

* Disable test_pipe.py for CI debugging

* Disable test_lr_scheduler for CI debugging

* Disable test_lr_scheduler for CI debugging

* Enable all unit tests for CI debugging

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-01-08 15:32:05 -08:00
Jeff Rasley bc046dc40b
add additional validation checks in elastic config (#646) 2021-01-08 11:02:33 -08:00
Jeff Rasley 44bd538b11
Module replacement support (#586)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2021-01-06 11:03:35 -08:00
gcooper-isi a9a83a6fcf
Allow DeepSpeed models to be initialized with optimizer=None (#469)
Allow DeepSpeed models to be initialized with optimizer=None

Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
2021-01-05 10:14:29 -08:00
Olatunji Ruwase e6ac731136
Support initialization with dict configuration (#632) 2021-01-04 15:55:41 -08:00
Jeff Rasley 81aeea361d
Elastic training support (#602)
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
2020-12-22 22:26:26 -08:00
Jeff Rasley 7435b2f10a
Ability to initialize distributed backend outside deepspeed runtime (#608) 2020-12-17 23:17:19 -08:00
Reza Yazdani fd2f970bdf
Transformer-kernel - supporting any arbitrary sequence-length (#587)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2020-12-17 10:13:54 -08:00
Jeff Rasley 845921b3b6
Add 'latest' checkpoint save/load support (#569) 2020-12-02 13:49:31 -08:00
Reza Yazdani c78c29f938
supporting different hidden dimensions (#559)
* supporting different hidden dimensions

* add support for larger hidden dimensions (greater than 8K)

* remove empty line

* add loop unrolling factor for dropout kernels

* update different kernels based on the reviews

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2020-12-01 14:01:24 -08:00
Jeff Rasley eec44af1e3
Turn back on PP tests (#558) 2020-11-24 17:29:08 -08:00
Olatunji Ruwase 6021b70288
Support non-tensor state in checkpoint (#548) 2020-11-21 15:41:22 -08:00
Olatunji Ruwase 0178e6cc22
Fix unbalanced gradients bug in ZeRO-2 gradient accumulation (#545)
* Use zero-tensors for missing gradients to avoid size mismatch

* Unit test for unbalanced gradients in ZeRO

* Formatting fixes
2020-11-20 15:39:01 -08:00
Jeff Rasley 08c96a1bc6
ZeRO-1 tune max-elems + bug fix (#532)
* zero-1 memory fix

* auto-tune max elems per comm to reduce padding/comm intervals

* clean-up and added previously missing reduction options

* fix testing backing to work with torch1.7
2020-11-19 08:16:27 -08:00
Olatunji Ruwase 7752dc5ea1
Fix layout bug in ZeRO Stage 1 checkpoint logic (#531)
* Fix layout bug in ZeRO Stage 1 checkpoint logic
Add elastic checkpoint option for ZeRO stage 1, default to True

* Format fixes
2020-11-17 16:20:02 -08:00
Jeff Rasley 31f46feee2
DeepSpeed JIT op + PyPI support (#496)
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
2020-11-12 11:51:38 -08:00
Olatunji Ruwase be1147c08a
PLD release (#513)
* Progressive layer dropping docs (#499)

* test

* Adding tutorial and news page for pld

* updating the tutorial and posts of PLD

* update the finetune tutorial

* Update PLD tutorial (#512)

* Update installation instructions

* Format fix

* ZeRO tutorial

* Format fixes

* ZeRO-Offload

* ZeRO and ZeRO-Offload tutorials

* Update navigation page

* Format fixes

* Add yuxhe feedback

* Fix blog post link

* Fix OneBit-Adam link
Tweak scheduler example

* Fix date link

* Add DeepSpeed_Adam

* Add PLD tutorial to navigation

Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* updating the pld docs

* DeepSpeed implementation of PLD (#508)

* DeepSpeed implementation of PLD

* Format fixes

* Formatting fixes

* Fix broken url

* Address PR feedback

* Bump DSE

Co-authored-by: Minjia Zhang <33713995+minjiaz@users.noreply.github.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Minjia Zhang <minjiaz@microsoft.com>
2020-11-10 12:53:50 -08:00
Reza Yazdani f5aa2547d8
Add CPUAdam optimizer for zero-offload in deepspeed engine (#484)
* add adamW to CPU-ADAM implementation

* supporting cpu-adam optimizer for zero-offload on deepspeed side

* bump DSE to match cpu-adam updates

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2020-10-30 09:01:04 -07:00
Jeff Rasley 679fc13512 turning off different tests (temp) 2020-10-06 20:03:59 -07:00