digger-yu
077e42e68a
Update install.sh ( #3270 )
...
Optimization Code
1. Use #!/usr/bin/env bash instead of #!/bin/bash to make the script more portable.
2. Use rm -rf instead of rm -r to remove directories recursively.
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2023-04-17 20:41:08 -07:00
Jeff Rasley
c3c8d5dd93
AMD support ( #1430 )
...
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jithun Nair <jithun.nair@amd.com>
Co-authored-by: rraminen <rraminen@amd.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: okakarpa <okakarpa@amd.com>
Co-authored-by: rraminen <rraminen@amd.com>
Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: okakarpa <okakarpa@amd.com>
Co-authored-by: Ramya Ramineni <62723901+rraminen@users.noreply.github.com>
2022-03-03 01:53:35 +00:00
Samyam Rajbhandari
599258f979
ZeRO 3 Offload ( #834 )
...
* Squash stage3 v1 (#146 )
Co-authored-by: Samyam <samyamr@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
* Fix correctness bug (#147 )
* formatting fix (#150 )
* stage3 bugfix (API) update and simplified FP16 Z3 tests (#151 )
* fp16 Z3 API update and bugfix
* revert debug change
* ZeRO-3 detach and race condition bugfixes (#149 )
* trying out ZeRO-3 race condition fix
* CUDA sync instead of stream
* reduction stream sync
* remove commented code
* Fix optimizer state_dict KeyError (#148 )
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152 )
* Simplifying the logic for getting averaged gradients (#153 )
* skip for now
* Z3 Docs redux (#154 )
* removing some TODOs and commented code (#155 )
* New Z3 defaults (#156 )
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* formatting
* megatron external params
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
2021-03-08 12:54:54 -08:00
Jeff Rasley
7bf1b837a4
[install] add -e/--examples flag to checkout submodules ( #755 )
...
* add -e/--examples flag to checkout submodules
* bump DSE commit
2021-02-12 10:19:37 -08:00
Stas Bekman
78e776a9ac
[install] fixes/improvements/docs ( #752 )
...
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-02-12 09:50:03 -08:00
Jeff Rasley
7435b2f10a
Ability to initialize distributed backend outside deepspeed runtime ( #608 )
2020-12-17 23:17:19 -08:00
Jeff Rasley
31f46feee2
DeepSpeed JIT op + PyPI support ( #496 )
...
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
2020-11-12 11:51:38 -08:00
Jeff Rasley
5bc7d4e1e6
Remove pip --use-feature ( #419 )
2020-09-17 16:57:54 -07:00
Shaden Smith
5812e84544
readthedocs yaml configuration ( #410 )
...
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2020-09-16 18:57:43 -07:00
Shaden Smith
65c2f974d8
Pipeline parallel training engine. ( #392 )
...
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2020-09-09 23:14:55 -07:00
Jeff Rasley
41db1c2f03
ZeRO-Offload release ( #391 )
...
* ZeRO-Offload (squash) (#381 )
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Jie <37380896+jren73@users.noreply.github.com>
Co-authored-by: Arash Ashari <arashari@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: arashashari <arashashari@ArashMSLaptop.redmond.corp.microsoft.com>
Co-authored-by: RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
2020-09-09 17:14:12 -07:00
Ammar Ahmad Awan
01726ce2b8
Add 1-bit Adam support to DeepSpeed ( #380 )
...
* 1-bit adam (#353 )
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: tanghl1994 <htang14@ur.rochester.edu>
Co-authored-by: Hank <tanghl1994@gmail.com>
Co-authored-by: root <root@node2x12b.cs.rochester.edu>
Co-authored-by: Ammar Ahmad Awan <awan.ammar@microsoft.com>
2020-09-09 14:37:37 -07:00
Jeff Rasley
e5bbc2e559
Sparse attn + ops/runtime refactor + v0.3.0 ( #343 )
...
* Sparse attn + ops/runtime refactor + v0.3.0
Co-authored-by: Arash Ashari <arashari@microsoft.com>
Co-authored-by: Arash Ashari <arashari@microsoft.com>
2020-09-01 18:06:15 -07:00
Jeff Rasley
f5025506de
install update: no-sudo + clean build files ( #258 )
...
* install update: no-sudo + clean build files
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
2020-06-09 10:46:35 -07:00
Jeff Rasley
7dc209c661
add basic post-install test ( #209 )
...
* add basic post-install test
2020-05-05 15:01:39 -07:00
Jeff Rasley
e0f5cc688e
add skip reqs flag ( #133 )
2020-03-11 13:29:18 -07:00
Jeff Rasley
259f894a8b
Install specific apex hash ( #132 )
...
* allow installing a specific apex commit
2020-03-11 12:17:12 -07:00
Incomplete
5f6294bd04
Add two CLI options to help with the installation inside of conda ( #113 )
...
* Add --no_sudo to run without sudo
* Add --pip_mirror to set the pip mirror
* Default to running pip without sudo
* Typo
* Add --pip_sudo to Dockerfile and azure-pipelines.yml
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2020-03-09 08:30:18 -07:00
Jeff Rasley
001abe2362
Refactor simple model test, fix pythonpath issue ( #96 )
...
Also a fix for #94
2020-02-20 14:16:41 -08:00
Jeff Rasley
bf2689a9dd
Fix bug in install script, bump TF version ( #71 )
...
* bump tf version in dockerfile
* Update install.sh
2020-02-12 17:06:22 -08:00
Shaden Smith
50ae149f82
Moving to major/minor/patch versioning. ( #51 )
2020-02-09 20:03:35 -08:00
Jeff Rasley
9f2e54c09e
DeepSpeed dockerfile, install reqs, update examples reqs ( #26 )
...
* update examples submodule
* install requirements.txt with install script
* add dockerfile
2020-02-05 14:58:13 -08:00
Jeff Rasley
00825428bb
update install to use pdcp to distribute wheels ( #12 )
...
update install to use pdcp to distribute wheels
2020-02-04 14:28:33 -08:00
Shaden Smith
b18eae24e8
Fixing file permissions ( #1 )
...
Fixing file permissions.
2020-02-03 10:55:19 -08:00
Jeff Rasley
16be6de6f1
Install script
2020-01-31 16:03:36 -08:00