Граф коммитов

57 Коммитов

Автор SHA1 Сообщение Дата
Michael Wyatt 0f2338f7b8
Fix RTD builds (#4558)
* Update .readthedocs.yml

* Update requirements-readthedocs.txt
2023-10-23 11:18:28 -07:00
Jeff Rasley 12aedac6ce
add available memory check to accelerators (#4508)
* add available memory check to accelerator

* catch case where nvmlInit fails

* add pynvml to reqs

* fix for cpu systems

* Update accelerator/cuda_accelerator.py

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* simplify

---------

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2023-10-16 12:23:28 +00:00
Matthew Hoffman 604d701e35
Introduce pydantic_v1 compatibility module for pydantic>=2.0.0 support (#4407)
* Introduce pydantic_v1 compatibility module for pydantic>=2.0.0 support
2023-10-09 11:59:30 -07:00
stephen youn e8ed7419ed
update deepspeed to run with the most recent triton 2.1.0 (#4278)
* fix codes to work with triton 2.1
tl.libdevice and triton.testing.allclose are gone with triton2.1

* formatting

* formatting

---------

Co-authored-by: Stephen Youn <styoun@microsoft.com>
Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com>
2023-09-07 17:20:22 +00:00
Lev Kurilenko f96c1c0a78
Pin Triton version to >=2.0.0 and <2.1.0 (#4251)
* Pin Triton version to 2.0.0

* Pin Triton version to < 2.1.0

* Add >=2.0.0

* pin transformers version

---------

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2023-09-01 16:47:52 -07:00
Michael Wyatt 9647ea791d
Add MuP optimizers (#2043)
* added paths for mup optimizers

* added tests

* formatting

* Add license, fix missing distributed test, formatting

* Add mpi4py to confirm tests work

* Undo requirements change

* Move to runtime folder

* Rework to match new format

* missing comma

* hidden dim fix

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Logan Adams <loadams@microsoft.com>
2023-08-24 21:00:00 +00:00
Olatunji Ruwase 7f90ef4bdd
Multiple zero stage 3 related fixes (#3886)
* Option to override module apply

* Removing early partitioning in override

* Unit tests

* Cleanup

* Adapt unit test to succeed

* Handle missed params

* Add accelerate

* Code cleanup

* Add doc

* Add doc

* Add doc
2023-07-28 15:58:30 +00:00
Michael Wyatt 8e808392c8
Specify triton 2.0.0 requirement (#4008)
* specify triton 2.0.0 requirement

* fix for setup-venv action

* fix for install error

* fix torch install error
2023-07-21 18:13:08 +00:00
Michael Wyatt aef6c65ce3
Reduce Unit Test Times (Part 3) (#3850)
* add coverage report

* define env vars in shared action

* reduce time for longest running tests

* fix broken shared action

* reduce test time

* reducing Pipeline test times

* further reducing test times

* rework Z3 test

* testing new mp.pool and persistent dist envs

* fix import

* reuse distributed environment for tests with lots of param combos

* fix for dist teardown

* fix pickling issue with pool cache

* actually fix pickling problem

* avoid running pool cache stuff on non-distributed tests

* fix issues with nested mp.pool

* fix for nested pools in Pipeline Engine

* re-add params

* update workflows with pytest opts

* implement feedback

* resolve race condition with port selection

* Update tests/unit/common.py

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2023-07-12 00:35:49 +00:00
stephen youn 69d1b9f978 DeepSpeed-Triton for Inference (#3748)
Co-authored-by: Stephen Youn <styoun@microsoft.com>
Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org>
Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Ethan Doe <yidoe@microsoft.com>
Co-authored-by: yidoe <68296935+yidoe@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-06-23 14:30:49 -07:00
Ma, Guokai 1f72082fc0
[CPU] Support Intel CPU inference (#3041)
* add fallback path for kernels used in megatron

* temporary numactl WA for SPR 56core

* adapt core allocation according to number of ranks

* add switch to turn on numactl

* detect number of cores on the system

* allow select a subset of the cores on the system to bind

* remove unneeded changes

* add ccl backend

* change nccl to ccl

* remove unused code

* add comm/ccl to ops

* initial ccl comm support

* first broadcast case passed

* add CCL_Backend to DeepSpeed

* support comm timer for CPU

* support barrier for comm backend

* support specify master address from deepspeed command line

* support pytorch 2.0

* remove 'block' from api

* Tweak for debug

Signed-off-by: Cao, Zhong Z <zhong.z.cao@intel.com>

* Remove unecessary directory

Signed-off-by: Cao, Zhong Z <zhong.z.cao@intel.com>

* Add bf16 kernel support for inference

* Add temporary torch implement for cpu inference

* Add softmax ops cpu fallback for inference

* bind cores to numa domain as well

* merge latest change in gma/numactl

* initial bf16 kernel support with fallback path

* initial fallback path for bloom kernel injection

* fix softmax attn mask

* check KMP_AFFINITY to avoid conflict with numactl

* New CCLBackend which utilize TorchBackend for initialization

* rollback last change because there is result error

* fix bloom injection policy TP could not work issue.

injection_policy={BloomBlock: ("self_attention.dense", "mlp.dense_4h_to_h")}

* Use TorchBackend to initialize CCLBackend, make behavior consistent

* remove comm under deepspeed/ops

* add license header

* code clean up

* fix format issue

* remove magic number in main address

* add caching support but not turn on by default

* change name of inference_cuda_module to inference_module

* Check for is_synchronized_device in accelerator before get Event

* fix typo

* Fix fallback path of softmax kernel on CUDA device for BF16 data type, because CUDA tril does not support BF16 datatype, enforce fp32 data type

* add cpu backend files

* change CPU_Accelerator op_builder_dir

* remove cpu_kernel_path

* using CPU_Accelerator on non-cuda device

* fix deepspeed.op_builder => deepspeed.ops.op_builder

* add alias for num_gpus: num_accelerators

* allow loading cpu_builder in build stage

* Assume cuda available if torch not installed

* add oneccl_binding_pt to requirements

* move oneccl-binding-pt to seperate requiremetns-cpu.txt

* add missing file

* use dependency_links in setuptools.setup() call for additional dependency links

* install oneccl_bind_pt in workflows

* change oneccl_bind_pt's version from 1.13 to 2.0

* use intel_exention_for_pytorch as indicator that CPU_Accelerator should be used

* Add indicator for Accelerator used

* change foo.c to foo.cpp

* exclude 'cpu' directory in CUDA op builder reflection

* add a cpu-inference workflow

* run cpu-inference workflow on self-hosted instance

* change cpu runs-on node to v100 node

* print out python version in workflow

* add verbose in pip command to understand oneccl_bind_pt install issue

* update cpu-inference workflow

* add a stage to detect instance instruction sets

* add back bf16 support for CPU inference

* enable autoTP for bloom

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update workflow to detect cpu instruction sets

* temporary WA for Intel Extension for PyTorch AVX2 instructioon set detection

* change cpu-inference workflow machine to ubuntu-20.04

* add sharded checkpoint loading for AutoTP path to reduce the peak memory in initialization stage

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* enable policy for llama

* use a special build ipex to test avx2 detection fix

* fix format

* fix test fail issue

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix gptj sharded checkpoint loading problem

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* return a not implemented build in get_op_builder in cpu_backend

* support cpu device in tests

* use cpuinfo to extract number of CPUs

* use ~/tmp as transfomer cache rather than /blob/

* Add support for mpich launcher with prefer_deepspeed_comm

* add missing modification in accelerator

* enable IMPI launcher

* remove unused file and fix formatting

* clean up ccl.cpp

* Less confusing error message when certin op builder are not implemented

* Fix license header

* Add license header

* add license headers

* add license header

* fix cuda specific code in test

* update CPU workflow

* use numactl to bind to core

* allow bind_cores_to_rank in multi-node impi runner

* fix format error

* Remove InferenceBuilder

* fix format error in numa.py

* check whether op is in installed ops in ds_report.py

* allow override accelerator with DS_ACCELERATOR='cuda','cpu' or 'xpu'

* lazy init class_dict in CUDA_Accelerator to avoid cyclic initialization of CUDA_Accelerator

* put short path in the beginning in real_accelerator.py

* device_count return number of NUMA nodes

* fix typo

* install numactl in cpu workflow

* Follow comments

* Better implementation of device_count() and current_device()

* remove dependency_link for Intel Extension for DeepSpeed

* use check is_synchronized_device in timer only once

* remove env mapping WA in cpu_accelerator

* fix duplicate definition

* fix format error

* refine ccl backend selection

* move comments to the right place

* remove prefer_deepspeed_comm, use CCLBackend by default

* refractor fallback path

* Fix execution failure in kernel injection path

* do not refractory kernel injection fallback path in  residual_add because it contains function call with side-effect

* guard residual_add fallback path with environ DS_KI_FALLBACK=True

* fix format error

* add test for allreduce on CPU workflow

* fix format error

* Fallback to TorchBackend if CCLBackend kernel are not implemented

* Update Intel Extension for Pytorch installation link

* Don't specify version number of Intel Extension for PyTorch

* install oneCCL for CCLBackend

* fix link path for CPU comm kernels

* fix source oneCCL environment

* source oneCCL env before run UT

* Give more specific instruction when CCL_ROOT not defined

---------

Signed-off-by: Cao, Zhong Z <zhong.z.cao@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: sdp <sdp@aia-sdp-spr-108864.jf.intel.com>
Co-authored-by: Cao, Zhong Z <zhong.z.cao@intel.com>
Co-authored-by: Zhenhuan Chen <zhenhuan.chen@intel.com>
Co-authored-by: baodii <di.bao@intel.com>
Co-authored-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: jianan-gu <jianan.gu@intel.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2023-05-16 11:59:22 -04:00
Tian, Feng 6938c449de
Add snip_momentum structured pruning which can support higher sparse ratio with minor accuracy loss (#3300)
Signed-off-by: Tian, Feng <feng.tian@intel.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2023-05-10 10:33:48 -07:00
Jeff Rasley a094c9763d
remove megatron-lm, no longer pip installable (#3389)
* remove megatron-lm, no longer pip installable

* Add skips to tests that require megatron-lm and can't be run currently.

* formatting

* Formatting

---------

Co-authored-by: Logan Adams <loadams@microsoft.com>
2023-04-28 20:30:26 +00:00
Michael Wyatt 4a3ca4e26d
Fix formatting (#3343)
* formatting

* fixing clang-format version

* update pre-commit URL
2023-04-21 09:57:46 -07:00
Logan Adams 089056236c
Fix pydantic and autodoc_pydantic (#3290) 2023-04-18 13:03:48 -07:00
Logan Adams 1ab42fe829 Revert "Test with triton 2"
This reverts commit 0f8b5da6ae.
2023-04-07 14:06:14 -07:00
Logan Adams 0f8b5da6ae Test with triton 2 2023-04-07 14:05:09 -07:00
Lev Kurilenko fcc0d9c0aa
Update SD triton version in requirements-sd.txt (#3135) 2023-04-04 10:51:31 -07:00
Carlos Mocholí 02e95e6ab4
Pin minimum `packaging` requirement (#2771)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-01-31 13:30:36 -08:00
Michael Wyatt 23e5133c35
update for lm-eval==0.3.0 (#2713)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-01-18 10:27:35 -08:00
Michael Wyatt 43bf035cfc
Update docs to autogenerate pydantic config model docs (#2509)
* update zero config docs
* add autogenerated docs for pydantic models used in ZeRO and Inference configs
2022-11-15 21:27:22 +00:00
lekurile b2a724e257
Add TestInjectionPolicy inference unittest class for testing custom injection policies (#2426)
This PR adds a TestInjectionPolicy inference unittest class for testing custom injection policies.

This test differs from the existing tests in that the injection_policy dictionary is explicitly specified when calling the DeepSpeed init_inference API.

The google/t5-v1_1-small text2text-generation model and the roberta-large fill-mask model are added as tests with the injection policy explicitly specified.

This is done to expand our unittest coverage to test the path where the replace_wo_policy function is invoked (see GH-2387).

Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2022-10-18 22:17:01 +00:00
Jeff Rasley ec13da6ba7
add SD injection policy (#2381)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-10-13 16:47:12 -07:00
Jeff Rasley b76e0f4fe0
increase min pre-commit versions (#2346) 2022-09-22 14:53:29 -07:00
Michael Wyatt 1a71e77dc2
Fix for distributed tests on pytorch>=1.12 (#2141)
Fix for distributed tests
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2022-08-01 23:51:09 +00:00
Jeff Rasley 559fb8e515
[docs] fix broken read-the-docs build (#2075) 2022-07-06 14:23:18 -07:00
Quentin Anthony c87f6ee209
DeepSpeed Monitor Module (Master) (#2013)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-06-16 08:55:12 -07:00
Jeff Rasley b666d5cd73
[inference] test suite for ds-kernels (bert, roberta, gpt2, gpt-neo, gpt-j) (#1992)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-06-15 14:21:19 -07:00
Reza Yazdani 8164ea9e6d
Fixing several bugs in the inference-api and the kernels (#1951)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-05-24 13:27:50 -07:00
Alex Hedges 8bbf081ad8
Add torchvision to requirements-dev.txt (#1642)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-12-16 11:31:37 -08:00
Jeff Rasley 7f58853c2e
[testing] 3x faster unit tests (#1636) 2021-12-14 08:15:25 -08:00
Victor 64c2946a23
use py-cpuinfo to detect SIMD_WIDTH in platform-independent way (#1616)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-12-11 02:31:55 +00:00
Pierce Stegman cda7c71895
Sparse Attention: Fix Triton errors (#1608) 2021-12-02 06:10:51 +00:00
Alex Hedges fc2f378ece
Improve pre-commit hooks (#1602)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-12-01 03:12:29 +00:00
Jeff Rasley a8a17f234a
Several fixes for our read-the-docs build (#1579) 2021-11-19 22:45:02 +00:00
Jeff Rasley a90497ecff
Remove hard tensorboardX requirement (#1571) 2021-11-17 17:15:28 -08:00
Cheng Li 9caa74e577
Autotuning (#1554)
* [squash] Staging autotuning v4

Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Minjia Zhang <minjiaz@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* add new extra, guard xgboost, cleanup dead files (#268)

* Fix autotuning docs (#1553)

* fix docs

* rewording the goal

* fix typos

* fix typos (#1556)

* fix typos

* fix format

* fix bug (#1557)

* fix bug

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Minjia Zhang <minjiaz@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2021-11-13 08:56:55 +00:00
Jeff Rasley 2665c8b149
Fix 1bit extra issue (#1542) 2021-11-11 08:57:17 -08:00
Jeff Rasley 24dd285ff4
fix read-the-docs based on this issue: https://github.com/sphinx-doc/sphinx/issues/9727 (#1489) 2021-10-27 12:08:44 -07:00
Jeff Rasley 6996bb0159
Sparse attn triton v1.0 support + torch1.8 test runner (#1374)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
2021-09-21 07:33:09 -07:00
Jeff Rasley 3b68984498
remove torchvision dependency (#1178) 2021-06-21 10:41:27 -07:00
Reza Yazdani cf82168c4e
remove triton version to use the latest stable pypi version (#1128) 2021-06-02 18:43:12 -07:00
Reza Yazdani 26e3841cd4
Change the sparse attention API to be compatible with latest changes of triton (#902)
* Change the sparse attention API to be compatible with latest changes on the triton side

* remove compatibility checks for CUDA 11

* Update requirements-sparse_attn.txt

Co-authored-by: Arash Ashari <arashari@microsoft.com>
2021-06-02 12:42:53 -07:00
Jeff Rasley 96eb5b12e3
delay imports for replace policies and fix missing req (#1100)
* delay imports for replace policies and fix missing req

* fix issue with _orig_layer_class always being None
2021-05-24 16:43:36 -07:00
Jeff Rasley cfa63f5dad
ZeRO stage 1 refresh (#1042) 2021-05-19 15:42:45 -07:00
Shaden Smith 46f4573b1a
Seeded unit tests (#1072)
* is not -> !=

* Use pytest-randomly to seed unit tests.
2021-05-13 15:20:59 -07:00
Jeff Rasley af548971f3
Fix for RTD 2021-03-08 14:34:54 -08:00
Samyam Rajbhandari 599258f979
ZeRO 3 Offload (#834)
* Squash stage3 v1 (#146)

Co-authored-by: Samyam <samyamr@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>

* Fix correctness bug (#147)

* formatting fix (#150)

* stage3 bugfix (API) update and simplified FP16 Z3 tests (#151)

* fp16 Z3 API update and bugfix

* revert debug change

* ZeRO-3 detach and race condition bugfixes (#149)

* trying out ZeRO-3 race condition fix

* CUDA sync instead of stream

* reduction stream sync

* remove commented code

* Fix optimizer state_dict KeyError (#148)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152)

* Simplifying the logic for getting averaged gradients (#153)

* skip for now

* Z3 Docs redux (#154)

* removing some TODOs and commented code (#155)

* New Z3 defaults (#156)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* formatting

* megatron external params

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
2021-03-08 12:54:54 -08:00
Jeff Rasley 81aeea361d
Elastic training support (#602)
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
2020-12-22 22:26:26 -08:00
Jeff Rasley 6380ee3511
Fixes for RTD build errors (#606)
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
2020-12-15 15:29:21 -08:00