Граф коммитов

73 Коммитов

Автор SHA1 Сообщение Дата
Olatunji Ruwase a5400974df
DeepNVMe perf tuning (#6560)
Add performance tuning utilities: `ds_nvme_tune` and `ds_io`.  
Update tutorial with tuning section.

---------

Co-authored-by: Ubuntu <jomayeri@microsoft.com>
Co-authored-by: Joe Mayer <114769929+jomayeri@users.noreply.github.com>
2024-09-26 13:07:19 +00:00
Olatunji Ruwase 659f6be105
Avoid security issues of subprocess shell (#6498)
Avoid security issues of `shell=True` in subprocess

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-09-11 20:07:06 +00:00
Olatunji Ruwase 662a421b05
Safe usage of popen (#6490)
Avoid shell=True security issues with Popen
2024-09-04 21:06:04 +00:00
Olatunji Ruwase 5d1a30c033
DS_BUILD_OPS should build only compatible ops (#6489)
Currently DS_BUILD_OPS=1 fails on incompatible ops. This is a deviation
from
[documentation](https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops)
which states that only compatible ops are built.

<img width="614" alt="image"
src="https://github.com/user-attachments/assets/0f1a184e-b568-4d25-9e9b-e394fb047df2">
2024-09-04 20:30:56 +00:00
Rohan Potdar 30428d0318
move pynvml install to setup.py (#5840)
Only install pynvml on nvidia gpus; not all accelerators
2024-08-15 16:27:10 +00:00
andyG 3c490f9cf4
Use accelerator to replace cuda in setup and runner (#5769)
Use accelerator apis to select device in setup.py and set visible
devices env in runner.py

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-08-01 13:28:55 -07:00
Costin Eseanu 74f3dcab62
Add Windows scripts (deepspeed, ds_report). (#5699)
Co-authored-by: Costin Eseanu <costineseanu@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-07-09 01:05:09 +00:00
Costin Eseanu b3767d01d4
Fixed Windows inference build. (#5609)
Fix #2427

---------

Co-authored-by: Costin Eseanu <costineseanu@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-06-24 13:39:18 -07:00
Costin Eseanu e7dd28a23d
Fixed the Windows build. (#5596)
Fixed the Windows build.

Fixes applied:
- Remove some more ops that don't build on Windows.
- Remove the use of symlinks that didn't work correctly and replace with
`shutil.copytree()`.
- Small fixes to make the C++ code compile.

Tested with Python 3.9 and CUDA 12.1.

---------

Co-authored-by: Costin Eseanu <costineseanu@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-05-31 22:11:10 +00:00
Antônio Vieira 059bb2085c
fix: swapping order of parameters in create_dir_symlink method. (#5465)
Order of parameters in create_dir_symlink method looks wrong. Because
this we get the error "PermissionError: [WinError 5] Denied access:
'.\\deepspeed\\ops\\csrc'" when install deepspeed >= 0.4.0 on Windows
enviroment.

Please check this out @eltonzheng and @jeffra.

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-04-29 17:37:54 +00:00
Ma, Guokai c08e69f212
Make op builder detection adapt to accelerator change (#5206)
This is an WIP PR that make op builder detection adapt to accelerator
change. This is followup of
https://github.com/microsoft/DeepSpeed/issues/5173
Currently, DeepSpeed generate `installed_ops` and `compatible_ops` at
setup time. If the system change to a different accelerator at DeepSpeed
launch time, these two list would contain incorrect information.

This PR intend to solve this problem with more flexity ops detection.

* For `installed_ops`, DeepSpeed should disable all installed ops if
accelerator detected at setup time is different from launch time.
* For `compatible_ops`, DeepSpeed should refresh the list for each
launch to avoid impact of accelerator change.

In the first step, nv-inference workflow is temporary change to emulate
the scenario that the system is setup with CPU_Accelerator, then launch
with CUDA_Accelerator. And CPU_Accelerator is modified to make Intel
Extension for PyTorch and oneCCL binding for PyTorch not mandatory.

Starting from here we can reconstruct installed_ops and compatible_ops
to follow the design above.

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-03-12 20:48:29 +00:00
Michael Wyatt 48749411b8
Disable ninja by default (#5194)
#5192 reports an issue with the latest DeepSpeed release (0.13.3)
related to pre-compilation and the recently re-enabled `ninja` support
in #5088. Reverting to disabling `ninja` by default, but users can still
enable it with `DS_ENABLE_NINJA=1` until we can further debug to
understand the problem.
2024-02-26 11:41:09 -08:00
Jinzhen Lin b00533e479
Use ninja to speed up build (#5088)
Deepspeed have too many ops now, and it take too many time to pre-build
all ops.
I notice deepspeed disabled `ninja` 4 years ago
(https://github.com/microsoft/DeepSpeed/pull/298) and I think we should
consider enable it now.
The issue mentioned in https://github.com/microsoft/DeepSpeed/pull/298
can be solved by resolving `include_dirs` to absolute path.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Logan Adams <loadams@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2024-02-21 02:20:11 +00:00
Logan Adams 8fb111c08d
Treat empty environment variables as unset in (#4185) 2023-08-21 22:32:31 +00:00
Yejing-Lai 7290aace9b
[CPU] Skip CPU support unimplemented error (#3633)
* skip cpu support unimplemented error and update cpu inference workflow

* add torch.bfloat16 to cuda_accelerator

* remove UtilsBuilder skip

* fused adam can build

* use cpu adam to implement fused adam

* enable zero stage 1 and 2 for synchronized accelerator (a.k.a. CPU)

* remove unused parameters

* remove skip FusedAdamBuilder; add suported_dtypes

* fix format

* Revert "fix format"

Revert "remove skip FusedAdamBuilder; add suported_dtypes"

Revert "remove unused parameters"

Revert "enable zero stage 1 and 2 for synchronized accelerator (a.k.a. CPU)"

Revert "use cpu adam to implement fused adam"

Revert "fused adam can build"

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Ma, Guokai <guokai.ma@intel.com>
2023-07-19 19:58:38 +00:00
stephen youn 69d1b9f978 DeepSpeed-Triton for Inference (#3748)
Co-authored-by: Stephen Youn <styoun@microsoft.com>
Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org>
Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Ethan Doe <yidoe@microsoft.com>
Co-authored-by: yidoe <68296935+yidoe@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-06-23 14:30:49 -07:00
Tian, Feng 6938c449de
Add snip_momentum structured pruning which can support higher sparse ratio with minor accuracy loss (#3300)
Signed-off-by: Tian, Feng <feng.tian@intel.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2023-05-10 10:33:48 -07:00
Pablo Emídio S.S f3f4c44959
Build: Update license in setup (#3484) 2023-05-08 17:16:05 +00:00
Michael Wyatt bcccee4d85
Fix cupy install version detection (#3276)
* updated cupy install

* do non-isolated pip install

* Update action.yml
2023-04-18 17:13:35 +00:00
Gavin Goodship adc15e1c17
Update curriculum-learning.md (#3031)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2023-04-07 08:51:03 -07:00
Michael Wyatt b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-03-30 17:14:38 -07:00
Jeff Rasley 91d63e0228
update formatter version and style settings (#3098) 2023-03-27 07:55:19 -04:00
Jeff Rasley bbfd0a6a3e
update email info 2023-03-15 14:16:26 -07:00
Logan Adams b4d40e357b
Fix example command when building wheel with dev version specified (#2815) 2023-02-21 18:16:35 +00:00
Jeff Rasley 0b549ad70a
[install] only add deepspeed pkg at install (#2714)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2023-01-18 10:26:26 -08:00
Jeff Rasley cd271a4aa6
exclude benchmarks during install (#2698) 2023-01-13 14:24:30 -08:00
Ma, Guokai 9548d48f48
Abstract accelerator (step 2) (#2560)
* Abstract accelerator (step 2)

* more flex op_builder path for both installation and runtime

* add SpatialInferenceBuilder into cuda_accelerator.py

* use reflection to make cuda_accelerator adapt to CUDA op builder change automatically

* clean up deepspeed/__init__.py

* add comments in cuda_accelerator for no torch path

* Update deepspeed/env_report.py

Change env_report.py according to suggestion

Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

* reduce the range of try...except for better code clarity

* Add porting for deepspeed/ops/random_ltd/dropping_utils.py

* move accelerator to top directory and create symlink under deepspeed

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-01-06 23:40:58 -05:00
Jeff Rasley 35eabb0a33
Fix issues w. python 3.6 + add py-version checks to CI (#2589) 2022-12-09 21:53:58 +00:00
Michael Wyatt 521d329b97
Fix CI issues related to cupy install (#2483)
* remove any cupy install when setting up environments

* revert previous changes to run on cu111 runners

* fix for when no cupy is installed

* remove cupy uninstall for workflows not using latest torch version

* update to cu116 for inference tests

* fix pip uninstall line

* move python environment list to after DS install

* remove cupy uninstall

* re-add --forked

* fix how we get cupy version (should be based on nvcc version)
2022-11-08 10:17:03 -08:00
eltonzheng b85eb3b979
Fix build issues on Windows (#2428)
* Fix build issues on Windows

* small fix to complie with new version of Microsoft C++ Build Tools

Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-10-26 00:14:43 +00:00
Jeff Rasley 1b7c6791d5
only add deps if extra is explictly called (#2432) 2022-10-18 13:57:02 -07:00
Alex Hedges 316c4a43e0
Add flake8 to pre-commit checks (#2051) 2022-07-25 16:48:08 -07:00
Alex Hedges 3540ce74d9
Check for bf16 support only if CUDA is available (#2049)
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2022-07-06 17:17:31 -06:00
Quentin Anthony 9b70ce56e7
Comms Benchmarks (#2040)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-06-29 10:49:20 -07:00
Jeff Rasley 7c3344e215
DeepSpeed examples refresh (#2021) 2022-06-15 18:46:30 -07:00
Jeff Rasley b666d5cd73
[inference] test suite for ds-kernels (bert, roberta, gpt2, gpt-neo, gpt-j) (#1992)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-06-15 14:21:19 -07:00
Michael Wyatt 7fc3065074
Add torch-latest and torch-nightly CI workflows (#1990)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-06-06 16:19:00 -07:00
Jithun Nair 350d74ca39
Invoke hipify from op builder for JIT extension builds (#1807)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-03-07 18:59:14 +00:00
Andrii Oriekhov d7684f4e81
add GitHub URL for PyPi (#1812)
* add GitHub URL for PyPi

* add GitHub URL for PyPi fix formatting
2022-03-06 04:42:03 +00:00
Jeff Rasley c3c8d5dd93
AMD support (#1430)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jithun Nair <jithun.nair@amd.com>
Co-authored-by: rraminen <rraminen@amd.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: okakarpa <okakarpa@amd.com>
Co-authored-by: rraminen <rraminen@amd.com>
Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: okakarpa <okakarpa@amd.com>
Co-authored-by: Ramya Ramineni <62723901+rraminen@users.noreply.github.com>
2022-03-03 01:53:35 +00:00
Jeff Rasley 9351266f78
Multi-node save pid support + allow sparse-attn extra (#1728) 2022-01-27 12:35:18 -08:00
Jeff Rasley e46d808a1b
MoE inference + PR-MoE model support (#1705)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Zhewei Yao <zheweiy@berkeley.edu>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
2022-01-18 16:25:01 -08:00
Cheng Li 9caa74e577
Autotuning (#1554)
* [squash] Staging autotuning v4

Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Minjia Zhang <minjiaz@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* add new extra, guard xgboost, cleanup dead files (#268)

* Fix autotuning docs (#1553)

* fix docs

* rewording the goal

* fix typos

* fix typos (#1556)

* fix typos

* fix format

* fix bug (#1557)

* fix bug

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Minjia Zhang <minjiaz@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2021-11-13 08:56:55 +00:00
Jeff Rasley 2665c8b149
Fix 1bit extra issue (#1542) 2021-11-11 08:57:17 -08:00
Alex Hedges be789b1665
Fix many typos (#1423)
* Fix typos in docs/

* Fix typos in code comments and output strings

* Fix typos in the code itself

* Fix typos in tests/

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2021-10-01 19:56:32 -07:00
Anurag Kumar 8e577c923d
Update setup.py (#1361)
updated classifiers
2021-09-13 09:37:32 -07:00
Adam Moody e82060d090
query for libaio package using known package managers (#1250)
* aio: test for libaio with various package managers

* aio: note typical tool used to install libaio package

* setup: abort with error if cannot build requested op

* setup: define op_envvar to return op build environment variable

* setup: call is_compatible once for each op

* setup: only print suggestion to disable op when its envvar not set

* setup: add method to abort from fatal error

* Revert "setup: add method to abort from fatal error"

This reverts commit 0e4cde6b0a.

* setup: add method to abort from fatal error

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2021-07-28 22:42:27 -07:00
Jeff Rasley da1fe2f82c
Remove hard torch dependency at install (#1166) 2021-06-16 14:18:37 -07:00
eltonzheng 10104284a2
Create symlinks on Windows setup (#1099)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-05-24 22:53:44 -07:00
Reza Yazdani ed3de0c21b
Quantization + inference release (#1091)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Elton Zheng <eltonz@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
Co-authored-by: Arash Ashari <arashari@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: niumanar <60243342+niumanar@users.noreply.github.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Arash Ashari <arashari@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: niumanar <60243342+niumanar@users.noreply.github.com>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Arash Ashari <arashari@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: niumanar <60243342+niumanar@users.noreply.github.com>
2021-05-24 01:10:39 -07:00