DeepSpeed

Граф коммитов

Автор	SHA1	Сообщение	Дата
Jeff Rasley	7435b2f10a	Ability to initialize distributed backend outside deepspeed runtime (#608 )	2020-12-17 23:17:19 -08:00
Reza Yazdani	fd2f970bdf	Transformer-kernel - supporting any arbitrary sequence-length (#587 ) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-12-17 10:13:54 -08:00
Jeff Rasley	6380ee3511	Fixes for RTD build errors (#606 ) Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>	2020-12-15 15:29:21 -08:00
Stas Bekman	007466e576	[doc] xref to hostfile discussion (#604 ) * [doc] xref to hostfile discussion wasn't clear where to find what was meant by `hostfile` - so adding a link to where it's discussed. * remove whitespace	2020-12-15 13:44:32 -08:00
Stas Bekman	9f8e8f3829	implement missing get_last_lr (#595 ) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-12-14 14:24:58 -08:00
Jeff Rasley	c5a449f9a3	Update launcher to set local rank environ variable (#597 ) * Update launch.py * formatting	2020-12-11 14:54:45 -08:00
carefree0910	a4763f5516	Supported customizing kwargs for lr_scheduler (#584 ) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-12-11 13:52:06 -08:00
Stas Bekman	66268bd337	add DeepSpeedZeroConfig repr method (#596 ) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-12-11 12:40:14 -08:00
Stas Bekman	8a184b6b1d	[build] fix computer capability arch flags, add PTX, handle PTX (#591 ) * fix arch flags, add PTX * bug fix Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-12-11 10:15:33 -08:00
Jeff Rasley	0518252d64	add manual workflow to run tests with precompiled ops	2020-12-11 10:05:37 -08:00
Jeff Rasley	7300f3e328	Add AML video link	2020-12-09 12:42:40 -08:00
Jeff Rasley	19acd6cf17	Add papers/videos to readme/website (#592 )	2020-12-09 12:25:47 -08:00
Jeff Rasley	cb7c7da6f7	bump to 0.3.8	2020-12-09 09:04:08 -08:00
Jeff Rasley	d901a6d2f5	Pin triton to 0.2.3 for now, 0.3.0 is broken	2020-12-09 09:03:05 -08:00
Shaden Smith	2f6269787a	Pipeline warnings and checkpoint portability (#588 ) * Switch from deprecated allreduce interface. * Make pipeline checkpoint files portable.	2020-12-08 09:42:08 -08:00
Stas Bekman	e8b126d986	[build] add compute_86 (#577 ) RTX-30 series are compute_86 ``` python -c "import torch; print(torch.cuda.get_device_capability())" ``` This PR adds support for this compute capability. Reference: https://developer.nvidia.com/cuda-gpus Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-12-07 12:49:50 -08:00
Stas Bekman	ce363d0e06	[build] make builder smarter and configurable wrt compute capabilities + docs (#578 )	2020-12-07 12:08:41 -08:00
Zhun	1e44d48d53	Fix potential random layout inconsistency issues in sparse attention modules (#534 ) * 1) Register layout as buffer of module so that we can save/load checkpoint; 2) Add a broadcast of layout at the beginning to ensure different processes will have consistent layout during distributed training. * Add docstring for max_seq_length argument in SparseSelfAttention Co-authored-by: Zhun Liu <zhunliu@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-12-04 14:58:10 -08:00
Stas Bekman	ff58fa7e5a	[build] build against installed cuda-11.1 while torch built w/ cuda-11.0 (#570 )	2020-12-02 21:20:16 -08:00
Jeff Rasley	be33bea475	Add compute capability 8.0 if on cuda 11+ (#572 )	2020-12-02 17:22:16 -08:00
Stas Bekman	2d1f7c0172	[engine] train should be able to get `mode` arg (#571 )	2020-12-02 16:54:00 -08:00
Jeff Rasley	845921b3b6	Add 'latest' checkpoint save/load support (#569 )	2020-12-02 13:49:31 -08:00
Stas Bekman	7a75f8b36f	[cifar tutorial] improve readability (#567 ) * [cifar tutorial] improve readability	2020-12-02 11:10:47 -08:00
Reza Yazdani	9f52a36fad	tracking optimizer step in cpu-adam when loading checkpoint (#564 ) * tracking optimizer step in cpu-adam when loading checkpoint * add warning/error message for updating optimizer step count * resolve build issue * supporting state update from the python side * track step from python in all cases * remove comma	2020-12-01 15:11:38 -08:00
Reza Yazdani	c78c29f938	supporting different hidden dimensions (#559 ) * supporting different hidden dimensions * add support for larger hidden dimensions (greater than 8K) * remove empty line * add loop unrolling factor for dropout kernels * update different kernels based on the reviews Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-12-01 14:01:24 -08:00
Stas Bekman	17f36f1b2e	[doc] typo fix and clarification (#563 ) This PR: * fixes a misspelled method name * also `( () )` doesn't read too well, until one reads the code and understands that it's not a formatting bug. I proposed to simply say that it's a callable object.	2020-11-27 21:05:27 -08:00
Jeff Rasley	c51fa65de8	bump to 0.3.7	2020-11-25 15:20:07 -08:00
Jeff Rasley	e4e20662fd	update manifest	2020-11-25 15:19:14 -08:00
Jeff Rasley	73c3262df6	bump to 0.3.6 and fix manifest to include reqs (#561 )	2020-11-25 10:27:10 -08:00
Shaden Smith	6009713653	Adds long_description to setup.py (#560 )	2020-11-25 09:43:53 -08:00
Jeff Rasley	16313a962b	bump to 0.3.5	2020-11-23 04:51:53 -08:00
Jeff Rasley	eec44af1e3	Turn back on PP tests (#558 )	2020-11-24 17:29:08 -08:00
Ammar Ahmad Awan	0e831e23b6	Simplify dist init and only init if needed. (#553 ) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-11-24 16:37:13 -08:00
Olatunji Ruwase	6e65c2cc08	Deprecate client ability to disable gradient reduction (#552 ) Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-11-24 15:14:37 -08:00
Jeff Rasley	1ef5cd2398	Update badges and CI name (#557 )	2020-11-24 14:37:40 -08:00
Jeff Rasley	3347460ed1	Switch to CI to GitHub Actions (#556 )	2020-11-24 14:31:27 -08:00
Jeff Rasley	c18fb0de91	Create main.yml	2020-11-24 14:03:29 -08:00
Samyam Rajbhandari	00c3a254a9	Bug fix for norm calculation in absence of model parallel group (#551 ) In the absence of a model parallel group, model_parallel_allreduce should not do any reduction. This commit fixes the bug which was doing a model parallel allreduce across world group when model parallel group is None	2020-11-23 11:29:20 -08:00
Samyam Rajbhandari	bcd56f9772	Adding static_loss_scale to unfused optimizer (#546 )	2020-11-22 20:07:37 -08:00
Olatunji Ruwase	6021b70288	Support non-tensor state in checkpoint (#548 )	2020-11-21 15:41:22 -08:00
Olatunji Ruwase	0178e6cc22	Fix unbalanced gradients bug in ZeRO-2 gradient accumulation (#545 ) * Use zero-tensors for missing gradients to avoid size mismatch * Unit test for unbalanced gradients in ZeRO * Formatting fixes	2020-11-20 15:39:01 -08:00
Jeff Rasley	6b28bc5db5	bump version 0.3.4	2020-11-19 23:10:37 +00:00
Ammar Ahmad Awan	1b45917cf6	Discover variables for NCCL backend on AML without mpi4py (#542 ) * Use AML method to set env vars instead of using mpi4py. Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-11-19 15:04:51 -08:00
Seunghwan Hong	d81cb26d92	Fix setup.py for cpu-only environment installation (#538 ) * Add guard to not using `torch.version.cuda` above no-CUDA environment. * Fix several typos on setup.py. Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>	2020-11-19 13:49:31 -08:00
Jeff Rasley	dce054dbba	backwards compatability w. v020 ckpts, fix issue with zero-1 ckpts (#543 )	2020-11-19 13:48:40 -08:00
Jeff Rasley	9de21b72b5	bump to v0.3.3	2020-11-19 08:36:19 -08:00
Jeff Rasley	08c96a1bc6	ZeRO-1 tune max-elems + bug fix (#532 ) * zero-1 memory fix * auto-tune max elems per comm to reduce padding/comm intervals * clean-up and added previously missing reduction options * fix testing backing to work with torch1.7	2020-11-19 08:16:27 -08:00
Jeff Rasley	fdd81c305c	more fine-grained manifest file for includes/excludes (#540 )	2020-11-18 16:42:19 -08:00
Jeff Rasley	5b09be60f7	append job-name if explicit output dir is given (#539 )	2020-11-18 14:53:04 -08:00
Olatunji Ruwase	7752dc5ea1	Fix layout bug in ZeRO Stage 1 checkpoint logic (#531 ) * Fix layout bug in ZeRO Stage 1 checkpoint logic Add elastic checkpoint option for ZeRO stage 1, default to True * Format fixes	2020-11-17 16:20:02 -08:00

1 2 3 4 5 ...

375 Коммитов Все ветки Поиск

375 Коммитов

Все ветки