DeepSpeed

Граф коммитов

Автор	SHA1	Сообщение	Дата
Jeff Rasley	8def3cb3a2	remove old unit test, should have been removed in rebase (#1097 ) * remove old unit tests, should have been removed in rebase * formatting	2021-06-08 09:34:21 -07:00
Reza Yazdani	26e3841cd4	Change the sparse attention API to be compatible with latest changes of triton (#902 ) * Change the sparse attention API to be compatible with latest changes on the triton side * remove compatibility checks for CUDA 11 * Update requirements-sparse_attn.txt Co-authored-by: Arash Ashari <arashari@microsoft.com>	2021-06-02 12:42:53 -07:00
Reza Yazdani	ed3de0c21b	Quantization + inference release (#1091 ) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: eltonzheng <eltonz@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Elton Zheng <eltonz@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: eltonzheng <eltonz@microsoft.com> Co-authored-by: Arash Ashari <arashari@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: niumanar <60243342+niumanar@users.noreply.github.com> Co-authored-by: eltonzheng <eltonz@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Arash Ashari <arashari@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: niumanar <60243342+niumanar@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: eltonzheng <eltonz@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Arash Ashari <arashari@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: niumanar <60243342+niumanar@users.noreply.github.com>	2021-05-24 01:10:39 -07:00
Olatunji Ruwase	5b393f1555	Avoid unused parameters assert by default (#1039 ) * Unused parameters assert should be disabled by default * Fix message * Invert assert logic in unit test * Change option for ignoring unused parameters Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2021-05-07 11:15:09 -07:00
Sean Naren	b3870363e0	[Stage][Fix] Add additional conditions when checking types of output from the model (#1026 ) * Add additional conditions when checking types of output from the model * Add test * Modify test to use torch.tensor as well Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>	2021-05-01 08:46:46 -07:00
Stas Bekman	de694b917f	[tests] make it easier to run tests (#923 ) * make it easier to run tests * cleanup Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2021-04-30 10:04:20 -07:00
Sean Naren	41ab660b5d	Refactor param_dict to config (#1008 ) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2021-04-28 17:05:03 -07:00
hamlet	d0b61f1810	Add find_unused_parameters option to DeepSpeedEngine (#945 ) * Add find_unused_parameters option As unused parameters in modules may not be expected sometimes, add an explicit error msg when it occurred and an option to avoid the error: https://github.com/microsoft/DeepSpeed/issues/707 * Add find_unused_parameters option As unused parameters in modules may not be expected sometimes, add an explicit error msg when it occurred and an option to avoid the error: https://github.com/microsoft/DeepSpeed/issues/707 * Fix syntax error * Fix yapf error * Fix yapf error * Fix yapf error * Fix yapf error * Move stage2 find_unused_parameters to config file * Add stage2 find_unused_parameters * Add stage2 find_unused_parameters * Add stage2_find_unused_parameters option * Change error msg to reflect zero_optimization config change * Fix yapf error * Fix yapf errors * Change find_unused_parameters option name * Change find_unused_parameters option name * Change find_unused_parameters option name * Change find_unused_parameters option name * Change find_unused_parameters option name * Add UnusedParametersModel for test option find_unused_parameters * Add unit test for stage2 find_unused_parameters * Add cpu-adam compatible check * Remove dups import * Trim spaces * Fix yapf errors * Trim spaces * Add False Positive test check * Fix find_unused_parameters test * Trim spaces * Fix yapf error	2021-04-25 04:45:27 -07:00
Olatunji Ruwase	e88ebbcfc9	Use amp autocast in ZeRO3 linear (#990 ) * Use amp autocast in ZeRO3 linear * Fix typo * Handle specific exceptions * CI breaks on torch.distributed * Add autocast unit test * Format fixes * Fix skip logic Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2021-04-23 10:53:55 -07:00
Olatunji Ruwase	cf5ea8912a	Add nvme unit/perf tests (#993 )	2021-04-22 10:02:30 -07:00
Cheng Li	894f21daaa	Use odd shape tensor to represent parameter data in partitioned state (#981 ) * use wierd shaped tensor to avoid silent failures when not registering externel params * fix typo Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>	2021-04-21 14:02:39 -07:00
Conglong Li	67a48aaa89	1-bit LAMB optimizer (#970 ) 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed. Author: @conglongli, @awan-10, @samyam, Hanlin Tang, Yuxiong He Paper: https://arxiv.org/abs/2104.06069 Co-authored-by: sdtblck <46172032+sdtblck@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2021-04-20 18:28:22 -07:00
Jeff Rasley	0d4a54a04d	ZeRO-Infinity (#976 ) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>	2021-04-18 23:45:37 -07:00
Reza Yazdani	e721cb691f	Supporting different hidden dimensions for transformer kernels-v2 (#934 ) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2021-04-07 17:06:41 -07:00
Stas Bekman	a128f34e7d	[benchmarks] flatten/unflatten benchmarks (#919 ) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2021-04-07 13:06:28 -07:00
Jeff Rasley	8db4fdf815	disable pipe test (#915 ) This test has been giving us trouble for a bit, seeing nondeterministic failures, skipping for now to not break out CI. Need to revisit soon though.	2021-04-02 13:20:21 -07:00
Conglong Li	68c8481bcf	1-bit Adam v2 (#817 ) Authors: @awan-10 @conglongli @samyam @jeffra What's new: NCCL-based implementation which provides better performance and usability compared to the MPI-based implementation. Add support to momentum masks for those parameters with constant zero gradients during training. Bug fixes (e.g., #813). * NCCL-based 1-bit Adam + Code Refactor for Comm. Backends (#594) * NCCL based 1-bit Implementation + Refactor to add communication backends (#593) * add nccl 1-bit optim. * temporary commit to save stuff. * Use dist collectives instead of mpi routines. * remove old code for comm. * Fix bugs. still does not work. * modify to test the nccl side code path * Initial gather impl. Works intra-node. * Updates to comm. phase 2. nccl comm. passed the tests. * refactor code to introduce nccl/mpi as backends for onebit adam. * Refactor updates to test/engine. * Fix compile/runtime errors. * simplify support for nccl/mpi backends. * Add missign file * Add compression backend in constructor. Revert later. * modify test with some perf counting. * Implement a true non-blocking gather for nccl side. * Revert "Add compression backend in constructor. Revert later." This reverts commit `df8c40d310`. * improve the 1-bit adam test. * Refactor comm. and compression backend in 1-bit adam. * Fix the test. * Fix runtime errors and typos in nccl backend * fix mpi backend. modify tests. * modify nccl perf test. * fix mpi side errors. * Add an mpi perf test * Sync DSE. * Remove old collectives file. * Undo a typo. * Graceful failure for torch versions that don't support nccl pt2pt. * Revert "Merge branch 'master' into staging-1bit-nccl-v2" This reverts commit `7840085070`, reversing changes made to `a6dba72aea`. * Revert "Revert "Merge branch 'master' into staging-1bit-nccl-v2"" This reverts commit `6dbdd9858b`. * comm optimization + 1-bit lamb * Saving/debugging commit. * finalizing 1-bit lamb * finalizing 1-bit lamb * add momentum mask and chkpt handling for 1-bit adam * Cleanup and modify nccl test to be runnable with deepspeed launcher. * Fix format. * fix formatting again. * make test runnable without mpi4py * Add dist.alltoall and dist.allgather instead of custom functions. * remove debug prints. * formatting and renaming * renaming * renaming * add unit test, fix existing tests * skip unit test when torch < 1.8 * revert 1-bit lamb * flatten momentum when dimension is more than 1 * add warning message for 1-bit adam under fp32 * improve version check * add fp32 test * 1-bit adam doc * fix file name * doc fix * torch 1.8 is released * doc fix * fix tests * update news * add doc for momentum mask * fix checkpoing handling, add unit test * checkpoint handling doc * doc final cleanup * bump dates * update tests * url change * doc fix * fix test * doc update Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2021-03-16 16:27:20 -07:00
Olatunji Ruwase	fa87a73a8a	Fix ZeRO3 save_checkpoint (#857 ) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2021-03-16 13:06:39 -07:00
Jeff Rasley	871f3048ad	Allow args to be optional in deepspeed.initialize (#825 )	2021-03-16 12:38:08 -07:00
Samyam Rajbhandari	4601885972	Samyamr/inference hook fix (#851 ) * Fix mis-aligned-grad When a parameter is not divisible by world size, the partitioned gradients are mis-aligned due to incorrect padding handling. This PR should fix for that. * Formatting fix * Adding static_scale test back for Z3, and also changing hidden size to be not divisile by world_size * also removing alignment from flat fp16 buffers * Testing for hidden dim alignment * inference hook fix * Update stage3.py * formatting * [bug-fix] move params to gpu if offload params is turned off Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2021-03-15 13:07:20 -07:00
Jeff Rasley	dd03cff29f	set adamw_mode default true (follows FusedAdam and < 0.3.11 logic) (#844 )	2021-03-10 18:02:08 -08:00
Samyam Rajbhandari	599258f979	ZeRO 3 Offload (#834 ) * Squash stage3 v1 (#146) Co-authored-by: Samyam <samyamr@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by: eltonzheng <eltonz@microsoft.com> * Fix correctness bug (#147) * formatting fix (#150) * stage3 bugfix (API) update and simplified FP16 Z3 tests (#151) * fp16 Z3 API update and bugfix * revert debug change * ZeRO-3 detach and race condition bugfixes (#149) * trying out ZeRO-3 race condition fix * CUDA sync instead of stream * reduction stream sync * remove commented code * Fix optimizer state_dict KeyError (#148) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152) * Simplifying the logic for getting averaged gradients (#153) * skip for now * Z3 Docs redux (#154) * removing some TODOs and commented code (#155) * New Z3 defaults (#156) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * formatting * megatron external params Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by: eltonzheng <eltonz@microsoft.com>	2021-03-08 12:54:54 -08:00
Olatunji Ruwase	ec8b1cb0a0	Activation checkpointing for non-tensor arguments and return values (#741 ) * Activation checkpoint support for non tensor input/output * Format fixes * Address PR comments; Add ordering edge case tests	2021-02-12 14:01:52 -08:00
Cheng Li	e2dfe0d17b	Add flops profiler tutorial (#682 ) * work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * fix tailing ws * fix names * remove multistep profiling and update docs * fix cases where functionals and submodules coexist in a parent module, update readme * fix typo * always invoke post hook function * fix module flops sum and update tests * update tutorial	2021-02-10 18:03:55 -08:00
Jeff Rasley	2e2dd861f3	Dist testing backend fixes, etc. (#708 )	2021-01-29 13:08:37 -08:00
Jeff Rasley	91b1b7f33e	[transformer-kernel] turn off unit test printing (#701 )	2021-01-27 12:45:29 -08:00
Shaden Smith	e59ba12d4d	make test_pipe more stable (#683 )	2021-01-20 12:29:41 -08:00
Olatunji Ruwase	865104be85	Support optimizer AdamW type (#670 )	2021-01-15 05:25:29 -08:00
Jeff Rasley	f032e56f8a	Validate consistent ckpt tags across ranks (#667 )	2021-01-14 13:38:46 -08:00
Cheng Li	e2fbe4d238	squash latest flops profiling changes (#1 ) (#664 ) Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2021-01-12 17:06:52 -08:00
Shaden Smith	adcfd2694d	Handle actvitation checkpointing args that are None or non-tensors (#660 ) Special thanks to @g-karthik for tracking this issue down.	2021-01-12 10:18:34 -08:00
Olatunji Ruwase	da5563a9c1	LR scheduler unit tests (#429 ) * Add Linear warmup+decay lr schedule Update lr schedule unit tests * LR scheduler unit tests for LR Range Test and 1Cycle * Disable yapf to preserve parameterizaton * Disable test_pipe.py for CI debugging * Disable test_lr_scheduler for CI debugging * Disable test_lr_scheduler for CI debugging * Enable all unit tests for CI debugging Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2021-01-08 15:32:05 -08:00
Jeff Rasley	bc046dc40b	add additional validation checks in elastic config (#646 )	2021-01-08 11:02:33 -08:00
Jeff Rasley	44bd538b11	Module replacement support (#586 ) Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>	2021-01-06 11:03:35 -08:00
gcooper-isi	a9a83a6fcf	Allow DeepSpeed models to be initialized with optimizer=None (#469 ) Allow DeepSpeed models to be initialized with optimizer=None Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>	2021-01-05 10:14:29 -08:00
Olatunji Ruwase	e6ac731136	Support initialization with dict configuration (#632 )	2021-01-04 15:55:41 -08:00
Jeff Rasley	81aeea361d	Elastic training support (#602 ) Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>	2020-12-22 22:26:26 -08:00
Jeff Rasley	7435b2f10a	Ability to initialize distributed backend outside deepspeed runtime (#608 )	2020-12-17 23:17:19 -08:00
Reza Yazdani	fd2f970bdf	Transformer-kernel - supporting any arbitrary sequence-length (#587 ) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-12-17 10:13:54 -08:00
Jeff Rasley	845921b3b6	Add 'latest' checkpoint save/load support (#569 )	2020-12-02 13:49:31 -08:00
Reza Yazdani	c78c29f938	supporting different hidden dimensions (#559 ) * supporting different hidden dimensions * add support for larger hidden dimensions (greater than 8K) * remove empty line * add loop unrolling factor for dropout kernels * update different kernels based on the reviews Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-12-01 14:01:24 -08:00
Jeff Rasley	eec44af1e3	Turn back on PP tests (#558 )	2020-11-24 17:29:08 -08:00
Olatunji Ruwase	6021b70288	Support non-tensor state in checkpoint (#548 )	2020-11-21 15:41:22 -08:00
Olatunji Ruwase	0178e6cc22	Fix unbalanced gradients bug in ZeRO-2 gradient accumulation (#545 ) * Use zero-tensors for missing gradients to avoid size mismatch * Unit test for unbalanced gradients in ZeRO * Formatting fixes	2020-11-20 15:39:01 -08:00
Jeff Rasley	08c96a1bc6	ZeRO-1 tune max-elems + bug fix (#532 ) * zero-1 memory fix * auto-tune max elems per comm to reduce padding/comm intervals * clean-up and added previously missing reduction options * fix testing backing to work with torch1.7	2020-11-19 08:16:27 -08:00
Olatunji Ruwase	7752dc5ea1	Fix layout bug in ZeRO Stage 1 checkpoint logic (#531 ) * Fix layout bug in ZeRO Stage 1 checkpoint logic Add elastic checkpoint option for ZeRO stage 1, default to True * Format fixes	2020-11-17 16:20:02 -08:00
Jeff Rasley	31f46feee2	DeepSpeed JIT op + PyPI support (#496 ) Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com>	2020-11-12 11:51:38 -08:00
Olatunji Ruwase	be1147c08a	PLD release (#513 ) * Progressive layer dropping docs (#499) * test * Adding tutorial and news page for pld * updating the tutorial and posts of PLD * update the finetune tutorial * Update PLD tutorial (#512) * Update installation instructions * Format fix * ZeRO tutorial * Format fixes * ZeRO-Offload * ZeRO and ZeRO-Offload tutorials * Update navigation page * Format fixes * Add yuxhe feedback * Fix blog post link * Fix OneBit-Adam link Tweak scheduler example * Fix date link * Add DeepSpeed_Adam * Add PLD tutorial to navigation Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * updating the pld docs * DeepSpeed implementation of PLD (#508) * DeepSpeed implementation of PLD * Format fixes * Formatting fixes * Fix broken url * Address PR feedback * Bump DSE Co-authored-by: Minjia Zhang <33713995+minjiaz@users.noreply.github.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Minjia Zhang <minjiaz@microsoft.com>	2020-11-10 12:53:50 -08:00
Reza Yazdani	f5aa2547d8	Add CPUAdam optimizer for zero-offload in deepspeed engine (#484 ) * add adamW to CPU-ADAM implementation * supporting cpu-adam optimizer for zero-offload on deepspeed side * bump DSE to match cpu-adam updates Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-10-30 09:01:04 -07:00
Jeff Rasley	679fc13512	turning off different tests (temp)	2020-10-06 20:03:59 -07:00

1 2 3

116 Коммитов