Граф коммитов

204 Коммитов

Автор SHA1 Сообщение Дата
cctry c58146471e
Openfold fix (#4368)
* update

* format

---------

Co-authored-by: Conglong Li <conglong.li@gmail.com>
2023-09-20 01:44:16 -07:00
Reza Yazdani 468882fb68
Add the policy to run llama model from the official repo (#4313)
* Add the llama2 support from the official llama repo

* add back commented function

* add new policy & implementation for llama2

* add some changes to inject/run the 70b llama model

* remove debugging code

* remove more debugging code

* formatting

* use num_kv only when it has positive value

* use the num_kv param only if  it is positive

* fix syntax and format errors.

* fix an issue with the float32 transform kernel

---------

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
2023-09-19 16:57:55 +00:00
Conglong Li f876d81d34
DeepSpeed4Science (#4357)
* zero++ tutorial PR (#3783)

* [Fix] _conv_flops_compute when padding is a str and stride=1 (#3169)

* fix conv_flops_compute when padding is a str when stride=1

* fix error

* change type of paddings to tuple

* fix padding calculation

* apply formatting check

---------

Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* fix interpolate flops compute (#3782)

* use `Flops Profiler` to test `model.generate()` (#2515)

* Update profiler.py

* pre-commit run --all-files

* Delete .DS_Store

* Delete .DS_Store

* Delete .DS_Store

---------

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Cheng Li <pistasable@gmail.com>

* revert PR #3611 (#3786)

* bump to 0.9.6

* ZeRO++ chinese blog (#3793)

* zeropp chinese blog

* try better quality images

* make title larger

* even larger...

* various fix

* center captions

* more fixes

* fix format

* remove staging trigger (#3792)

* DeepSpeed-Triton for Inference (#3748)

Co-authored-by: Stephen Youn <styoun@microsoft.com>
Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org>
Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Ethan Doe <yidoe@microsoft.com>
Co-authored-by: yidoe <68296935+yidoe@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* ZeRO++ (#3784)

Co-authored-by: HeyangQin <heyangqin@microsoft.com>
Co-authored-by: GuanhuaWang <alexwgh333@gmail.com>
Co-authored-by: cmikeh2 <connorholmes@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>

* adding zero++ to navigation panel of deepspeed.ai (#3796)

* Add ZeRO++ Japanese blog (#3797)

* zeropp chinese blog

* try better quality images

* make title larger

* even larger...

* various fix

* center captions

* more fixes

* fix format

* add ZeRO++ Japanese blog

* add links

---------

Co-authored-by: HeyangQin <heyangqin@microsoft.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>

* Bug Fixes for autotuner and flops profiler (#1880)

* fix autotuner when backward is not called

* fix format

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Missing strided copy for gated MLP (#3788)

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

* Requires grad checking. (#3789)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* bump to 0.10.0

* Fix Bug in transform.cu (#3534)

* Bug fix

* Fixed formatting error

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

* bug fix: triton importing error (#3799)

Co-authored-by: Stephen Youn <styoun@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* DeepSpeed4Science (#569)

* Integrating evoformer attention

* add cutlass version check

* Updaate error message

* add benchmark

* Update

* Update evoformer_attn.py

* Update run_evoformer_test.py

* Update evoformer_attn.py

* Update run_evoformer_test.py

* support more GPU archs

* add copyright

* add tests

* Fix bugs

* Update benchmark

* update

* Fix nvcc macro

* clean code

* fix formatting

* fix yaml import

* skip unit test when not compatible

* fix yaml requirement

* revert changes

* update tutorial

* update

* fix formatting

* fix format

* skip evoformer attn in pre-compile-ops

* revert changes

* update tutorial

* fix cutlass check

* update tutorial

* refactor tutorial

* revise

* Updated the Megatron-DS section (#565)

* Updated the Megatron-DS section

* minor fix

* minor fix

* minor fix

* separate evoformer tutorial

* Revised the ds4science landing page (#566)

* Updated the Megatron-DS section

* minor fix

* minor fix

* minor fix

* Revised the landing page

* Revised the landing page

* Removing unused file

* fix links image position

* modify main page

* fix doc

---------

Co-authored-by: Shiyang Chen <csycfl@gmail.com>
Co-authored-by: Minjia Zhang <33713995+minjiaz@users.noreply.github.com>

---------

Co-authored-by: Heyang Qin <heyangqin@microsoft.com>
Co-authored-by: Bill Luo <50068224+zhiruiluo@users.noreply.github.com>
Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Guorun <84232793+CaffreyR@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: stephen youn <13525892+stephen-youn@users.noreply.github.com>
Co-authored-by: Stephen Youn <styoun@microsoft.com>
Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org>
Co-authored-by: Ethan Doe <yidoe@microsoft.com>
Co-authored-by: yidoe <68296935+yidoe@users.noreply.github.com>
Co-authored-by: GuanhuaWang <alexwgh333@gmail.com>
Co-authored-by: cmikeh2 <connorholmes@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Joe Mayer <114769929+jomayeri@users.noreply.github.com>
Co-authored-by: Ramya Ramineni <62723901+rraminen@users.noreply.github.com>
Co-authored-by: Shiyang Chen <csycfl@gmail.com>
Co-authored-by: Minjia Zhang <33713995+minjiaz@users.noreply.github.com>
2023-09-18 22:16:08 +00:00
Alex Kogan 9bf77782b2
Fix a bug in the implementation of dequantization for inference (#3433)
* bugfix in launch_dequantize()

Get rid of `hid_cnt` and simply set #blocks to output size / #groups

* add a unit test for dequantization

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2023-09-14 21:09:41 +00:00
Olatunji Ruwase aa4a7401f8
ZeRO-Inference refresh (#4197)
* INT4 weight only quantization (#479)

* INT4 weight only quantization

* pre commit

* fix UT

* fix UT

* fix UT

* fix UT

* fix UT

* fix UT

* fix UT

* add zero3 test

* quantize small weight first to prevent oom

* fold quantization config into ds_config

* Fix license & refactor ds_config & rebase master

* fix UT

* Moving quantization into post_init_method and add int4 dequantization kernel (#522)

* Add experimental int4 dequantize kernel

* move quantiation into post_init_method

* fix

* Refactor: move int4 code to deepspeed/inference (#528)

* Move int 4 code to deepspeed/inference

* fix

* fix

* fix

* zero++ tutorial PR (#3783)

* [Fix] _conv_flops_compute when padding is a str and stride=1 (#3169)

* fix conv_flops_compute when padding is a str when stride=1

* fix error

* change type of paddings to tuple

* fix padding calculation

* apply formatting check

---------

Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* fix interpolate flops compute (#3782)

* use `Flops Profiler` to test `model.generate()` (#2515)

* Update profiler.py

* pre-commit run --all-files

* Delete .DS_Store

* Delete .DS_Store

* Delete .DS_Store

---------

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Cheng Li <pistasable@gmail.com>

* revert PR #3611 (#3786)

* bump to 0.9.6

* ZeRO++ chinese blog (#3793)

* zeropp chinese blog

* try better quality images

* make title larger

* even larger...

* various fix

* center captions

* more fixes

* fix format

* remove staging trigger (#3792)

* DeepSpeed-Triton for Inference (#3748)

Co-authored-by: Stephen Youn <styoun@microsoft.com>
Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org>
Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Ethan Doe <yidoe@microsoft.com>
Co-authored-by: yidoe <68296935+yidoe@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* ZeRO++ (#3784)

Co-authored-by: HeyangQin <heyangqin@microsoft.com>
Co-authored-by: GuanhuaWang <alexwgh333@gmail.com>
Co-authored-by: cmikeh2 <connorholmes@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>

* adding zero++ to navigation panel of deepspeed.ai (#3796)

* Add ZeRO++ Japanese blog (#3797)

* zeropp chinese blog

* try better quality images

* make title larger

* even larger...

* various fix

* center captions

* more fixes

* fix format

* add ZeRO++ Japanese blog

* add links

---------

Co-authored-by: HeyangQin <heyangqin@microsoft.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>

* Bug Fixes for autotuner and flops profiler (#1880)

* fix autotuner when backward is not called

* fix format

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Missing strided copy for gated MLP (#3788)

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

* Requires grad checking. (#3789)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* bump to 0.10.0

* Fix Bug in transform.cu (#3534)

* Bug fix

* Fixed formatting error

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

* bug fix: triton importing error (#3799)

Co-authored-by: Stephen Youn <styoun@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* Fix dequant bug

* Address PR feedback

* Use super() __exit__

* Fix unit tests

---------

Co-authored-by: Donglin Zhuang <donglinzhuang@outlook.com>
Co-authored-by: Heyang Qin <heyangqin@microsoft.com>
Co-authored-by: Bill Luo <50068224+zhiruiluo@users.noreply.github.com>
Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Guorun <84232793+CaffreyR@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: stephen youn <13525892+stephen-youn@users.noreply.github.com>
Co-authored-by: Stephen Youn <styoun@microsoft.com>
Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org>
Co-authored-by: Ethan Doe <yidoe@microsoft.com>
Co-authored-by: yidoe <68296935+yidoe@users.noreply.github.com>
Co-authored-by: GuanhuaWang <alexwgh333@gmail.com>
Co-authored-by: cmikeh2 <connorholmes@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Joe Mayer <114769929+jomayeri@users.noreply.github.com>
Co-authored-by: Ramya Ramineni <62723901+rraminen@users.noreply.github.com>
2023-09-11 16:19:21 +00:00
Connor Holmes 542dc0d5cb
AMD Kernel Compatibility Fixes (#3180)
* Guard against APIs not available on AMD in reduction_utils, code cleanup

* More API alignment simplification

* Int conversion fix

* Syntax

---------

Co-authored-by: Logan Adams <loadams@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
2023-09-08 16:54:57 +00:00
Ma, Guokai 19e9a7c028
[CPU][Bugfix] Make uid and addr_port part of SHM name in CCL backend (#4115)
* distinguish shm name with uid and addr_port

* fix formatting
2023-08-17 10:58:52 +00:00
Connor Holmes f0463b4d1f
Pass correct node size for ZeRO++ (#4085)
* Pass correct node size

* formatting

---------

Co-authored-by: Connor Holmes <development@cmikeh2.me>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2023-08-09 01:55:44 +00:00
Ma, Guokai 82c498d947
Fix deadlock when SHM based allreduce spin too fast (#4048)
* Fix deadlock when allreduce spin too fast

* Change state to enum to increase readability

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2023-07-28 15:58:45 +00:00
Ma, Guokai 7f26bb6ae4
faster allreduce with omp parallel for reduce kernel (#4049) 2023-07-27 21:59:26 +00:00
Ma, Guokai 0f5406323c
[CPU] FusedAdam and CPU training support (#3991)
* fused adam can build

* use cpu adam to implement fused adam

* enable zero stage 1 and 2 for synchronized accelerator (a.k.a. CPU)

* remove unused parameters

* fix format error

* Remove adam class

* fix format

* support stage3

* reuse simd.h

* fix format

* make memory_stat return meaningful dict

* fix format

* add cpu_adam

* reuse cpu_adam

* header cleanup

* fix cpu_adam

* fix format, add missing file

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2023-07-25 13:32:02 +00:00
Ma, Guokai 1bc3b78423
[CPU] Use allreduce_low_latency for AutoTP and implement low latency allreduce for CPU backend (single node) (#3919)
* use allreduce_low_latency for AutoTP and implement low latency allreduce for CPU backend (single node)

* add fp32 support for SHM allreduce

* avoid assertion for FP16 data type

* fix format

* change 'allreduce_low_latency' to 'inference_allreduce'

* Fix according to comments

* change inference_allreduce to inference_all_reduce to keep naming consistency

* check whether LOCAL_SIZE is defined in ccl.cpp, also define LOCAL_SIZE in test_distributed

* fix format

* Fix format error

* Update tests/unit/comm/test_dist.py

Fix world_size to 4 in UT

Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
2023-07-19 20:57:54 +00:00
Alexander Grund 9aeba94a8e
Avoid deprecation warnings in `CHECK_CUDA` (#3854)
The `type()` function is deprecated and `is_cuda()` can be used since about forever.
This avoids MANY warnings when compiling extensions.

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2023-07-06 20:30:28 +00:00
Ramya Ramineni aebdfb3b92 Fix Bug in transform.cu (#3534)
* Bug fix

* Fixed formatting error

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2023-06-23 14:30:49 -07:00
Heyang Qin d18aa2c79c ZeRO++ (#3784)
Co-authored-by: Sam Abe Jacobs <samjacobs@microsoft.com>
Co-authored-by: GuanhuaWang <alexwgh333@gmail.com>
Co-authored-by: cmikeh2 <connorholmes@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2023-06-23 14:30:49 -07:00
Logan Adams cd911f9ab2
Fix output transpose dimension bugs (#3747) 2023-06-13 16:51:30 -07:00
john li 46bb08c2df
Include cublas error details when getting cublas handle fails (#3695)
* include cublas error details when getting cublas handle fails

* run clang-format

* just use raw enum value to avoid depending on minimum cuda version

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2023-06-13 20:12:26 +00:00
mzl d1c3c0df53
fix_typo (#3559) 2023-05-18 12:13:37 -07:00
Ma, Guokai 1f72082fc0
[CPU] Support Intel CPU inference (#3041)
* add fallback path for kernels used in megatron

* temporary numactl WA for SPR 56core

* adapt core allocation according to number of ranks

* add switch to turn on numactl

* detect number of cores on the system

* allow select a subset of the cores on the system to bind

* remove unneeded changes

* add ccl backend

* change nccl to ccl

* remove unused code

* add comm/ccl to ops

* initial ccl comm support

* first broadcast case passed

* add CCL_Backend to DeepSpeed

* support comm timer for CPU

* support barrier for comm backend

* support specify master address from deepspeed command line

* support pytorch 2.0

* remove 'block' from api

* Tweak for debug

Signed-off-by: Cao, Zhong Z <zhong.z.cao@intel.com>

* Remove unecessary directory

Signed-off-by: Cao, Zhong Z <zhong.z.cao@intel.com>

* Add bf16 kernel support for inference

* Add temporary torch implement for cpu inference

* Add softmax ops cpu fallback for inference

* bind cores to numa domain as well

* merge latest change in gma/numactl

* initial bf16 kernel support with fallback path

* initial fallback path for bloom kernel injection

* fix softmax attn mask

* check KMP_AFFINITY to avoid conflict with numactl

* New CCLBackend which utilize TorchBackend for initialization

* rollback last change because there is result error

* fix bloom injection policy TP could not work issue.

injection_policy={BloomBlock: ("self_attention.dense", "mlp.dense_4h_to_h")}

* Use TorchBackend to initialize CCLBackend, make behavior consistent

* remove comm under deepspeed/ops

* add license header

* code clean up

* fix format issue

* remove magic number in main address

* add caching support but not turn on by default

* change name of inference_cuda_module to inference_module

* Check for is_synchronized_device in accelerator before get Event

* fix typo

* Fix fallback path of softmax kernel on CUDA device for BF16 data type, because CUDA tril does not support BF16 datatype, enforce fp32 data type

* add cpu backend files

* change CPU_Accelerator op_builder_dir

* remove cpu_kernel_path

* using CPU_Accelerator on non-cuda device

* fix deepspeed.op_builder => deepspeed.ops.op_builder

* add alias for num_gpus: num_accelerators

* allow loading cpu_builder in build stage

* Assume cuda available if torch not installed

* add oneccl_binding_pt to requirements

* move oneccl-binding-pt to seperate requiremetns-cpu.txt

* add missing file

* use dependency_links in setuptools.setup() call for additional dependency links

* install oneccl_bind_pt in workflows

* change oneccl_bind_pt's version from 1.13 to 2.0

* use intel_exention_for_pytorch as indicator that CPU_Accelerator should be used

* Add indicator for Accelerator used

* change foo.c to foo.cpp

* exclude 'cpu' directory in CUDA op builder reflection

* add a cpu-inference workflow

* run cpu-inference workflow on self-hosted instance

* change cpu runs-on node to v100 node

* print out python version in workflow

* add verbose in pip command to understand oneccl_bind_pt install issue

* update cpu-inference workflow

* add a stage to detect instance instruction sets

* add back bf16 support for CPU inference

* enable autoTP for bloom

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update workflow to detect cpu instruction sets

* temporary WA for Intel Extension for PyTorch AVX2 instructioon set detection

* change cpu-inference workflow machine to ubuntu-20.04

* add sharded checkpoint loading for AutoTP path to reduce the peak memory in initialization stage

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* enable policy for llama

* use a special build ipex to test avx2 detection fix

* fix format

* fix test fail issue

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix gptj sharded checkpoint loading problem

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* return a not implemented build in get_op_builder in cpu_backend

* support cpu device in tests

* use cpuinfo to extract number of CPUs

* use ~/tmp as transfomer cache rather than /blob/

* Add support for mpich launcher with prefer_deepspeed_comm

* add missing modification in accelerator

* enable IMPI launcher

* remove unused file and fix formatting

* clean up ccl.cpp

* Less confusing error message when certin op builder are not implemented

* Fix license header

* Add license header

* add license headers

* add license header

* fix cuda specific code in test

* update CPU workflow

* use numactl to bind to core

* allow bind_cores_to_rank in multi-node impi runner

* fix format error

* Remove InferenceBuilder

* fix format error in numa.py

* check whether op is in installed ops in ds_report.py

* allow override accelerator with DS_ACCELERATOR='cuda','cpu' or 'xpu'

* lazy init class_dict in CUDA_Accelerator to avoid cyclic initialization of CUDA_Accelerator

* put short path in the beginning in real_accelerator.py

* device_count return number of NUMA nodes

* fix typo

* install numactl in cpu workflow

* Follow comments

* Better implementation of device_count() and current_device()

* remove dependency_link for Intel Extension for DeepSpeed

* use check is_synchronized_device in timer only once

* remove env mapping WA in cpu_accelerator

* fix duplicate definition

* fix format error

* refine ccl backend selection

* move comments to the right place

* remove prefer_deepspeed_comm, use CCLBackend by default

* refractor fallback path

* Fix execution failure in kernel injection path

* do not refractory kernel injection fallback path in  residual_add because it contains function call with side-effect

* guard residual_add fallback path with environ DS_KI_FALLBACK=True

* fix format error

* add test for allreduce on CPU workflow

* fix format error

* Fallback to TorchBackend if CCLBackend kernel are not implemented

* Update Intel Extension for Pytorch installation link

* Don't specify version number of Intel Extension for PyTorch

* install oneCCL for CCLBackend

* fix link path for CPU comm kernels

* fix source oneCCL environment

* source oneCCL env before run UT

* Give more specific instruction when CCL_ROOT not defined

---------

Signed-off-by: Cao, Zhong Z <zhong.z.cao@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: sdp <sdp@aia-sdp-spr-108864.jf.intel.com>
Co-authored-by: Cao, Zhong Z <zhong.z.cao@intel.com>
Co-authored-by: Zhenhuan Chen <zhenhuan.chen@intel.com>
Co-authored-by: baodii <di.bao@intel.com>
Co-authored-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: jianan-gu <jianan.gu@intel.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2023-05-16 11:59:22 -04:00
Ramya Ramineni 5147b90aa4
[ROCm] Hip headers fix (#3532)
* Add cg headers hipification

* Exclude including cuda_bf16.h on ROCm

* Merge

* Retricting including cuda_bf16.h with BF16_AVAILABLE var

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2023-05-16 09:07:27 -04:00
LiYu Lu f1fab902c8
fix spelling error (#3482)
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2023-05-08 22:52:56 +00:00
Connor Holmes 0a61d5d664
Hybrid Engine Refactor and Llama Inference Support (#3425)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-05-03 17:20:07 -07:00
Dino Chen 297cd9ed8a
add bf16 cuda kernel support (#3092)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2023-04-22 15:23:28 +00:00
Ramya Ramineni 6e1cbebe52
Hipify cooperative_groups headers (#3323)
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2023-04-21 18:41:23 +00:00
Michael Wyatt 4a3ca4e26d
Fix formatting (#3343)
* formatting

* fixing clang-format version

* update pre-commit URL
2023-04-21 09:57:46 -07:00
Connor Holmes 145c3a7591
Fix missing scale attributes for GPTJ (#3256)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2023-04-20 17:38:02 -07:00
Bhavya Medishetty f7bfe5e7ef
[ROCm] temporary workaround till __double2half support enabled in HIP (#3236)
* temporary WAR workaround till __double2half support enabled in HIP

* workaround only for hipcc

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2023-04-18 18:37:25 +00:00
Heyang Qin 48297c4841
improving int4 asymmetric quantization accuracy (#3190)
* Fixes for asymmetric quantization

* addtional offset to further improve accuracy

* put the 0.5 into offset rather than applying it later

* update unit test for quantization

* fix format

* attempt to fix format

---------

Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2023-04-17 16:54:28 -07:00
Olatunji Ruwase 47f9f13bd3
DeepSpeed Chat (#3186)
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: yaozhewei <zheweiy@berkeley.edu>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
Co-authored-by: Lok Chand Koppaka <lokoppak@microsoft.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2023-04-11 11:53:38 -07:00
Bing Xie 0cd64bd4c9
fixing a bug in CPU Adam and Adagrad (#3109)
Co-authored-by: Bing Xie <bingxie@BINGHYPC014.redmond.corp.microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-04-07 08:52:31 -07:00
Olatunji Ruwase 514b020bce
Use generic O_DIRECT (#3115) 2023-04-05 10:05:39 -04:00
Molly Smith e73de8cee8
Optimize Softmax Kernel (#3112)
* Simplify kernel

* Coalesce memory attempt 1. Logits divergence.

* Logits fix?

* sync after every global mem access

* template on iterations. Down to 8.3% cuda time for 8k tokens

* Up to 64 iterations

* Add alibi/mask check

* fp32

* Revert builder.py

* naming. precommit

* Revert "naming. precommit"

This reverts commit 150eb7d96b.

* naming. spacing

* Spacing. simplify checks

* remove bsyncs

* missed bsyncs

* precommit
2023-04-05 03:17:29 +00:00
Michael Wyatt b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-03-30 17:14:38 -07:00
Jeff Rasley 91d63e0228
update formatter version and style settings (#3098) 2023-03-27 07:55:19 -04:00
Connor Holmes 1286e374ab
Softmax Scheduling Cleanup (#3046)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-03-22 08:45:06 -07:00
Mor Zusman 871c8a3f5d
fix return prev key and value , added strides to from_blob (#2828)
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-03-22 08:43:35 -07:00
Quentin Anthony b38b3036dd
[docs] add MCR-DL paper to readme/docs (#3066) 2023-03-21 10:19:16 -07:00
Jeff Rasley da84e60d98
add missing license info to top of all source code (#2889)
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2023-02-27 11:20:41 -08:00
Reza Yazdani 5b7413a4fc
Fix gpt-Neox rotary embedding implementation (#2782)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-02-16 10:47:45 -08:00
Lev Kurilenko fd1449c766
Port Reza's INT8-quantization fix to container architecture (#2725)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Heyang Qin <heyangqin@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2023-02-16 10:12:18 -08:00
Olatunji Ruwase c9b08888d0
Enable page-locked tensors without CUDA (#2775)
* Enable page-locked memory in cpu only env

* Enable page-locked memory in cpu only env

* Formatting

* Add TODOs; Release page-locked memory

* Update perf microbenchmark; Reduce unit test memory

* Reduce CI mem usage
2023-02-07 17:14:19 -05:00
Reza Yazdani 9f41ffe4a6
Reset KV-cache at the beginning of text-generation (#2669)
Co-authored-by: Martin Cai <martincai@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-02-03 12:07:44 -08:00
Reza Yazdani 0b06e0cbb0
Fix softmax backward (#2709)
* Reset KV-cache at the beginning of text-generation

* Add new backward kernel to handle large softmax-length

* remove unrelated changes

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
2023-01-26 18:22:35 -05:00
Ma, Guokai 98cc35b6a8
Abstract accelerator (step 3) (#2677)
* Integrate accelerator abstraction interface into deepspeed/

* Fix error message in fp16/fused_optimizer

* fix error message in fp16/unfused_optimizer.py

* assign get_accelerator().pin_memory() result to input Tensor name

* no need to check cuda and whether nvtx supported

* move try-except into inner most block

* call Event() and Stream() in get_accelerator() for data type

* Make Stream and Event as properties of abstract interface so they can be used as data type in deepspeed

* Apply op_builder backend api change from #2705 from @jeffra

* fix tests where Builder NAME is used

* keep original ...Builder.NAME interface instead of ...Builder().NAME interface

* fix builder closure for installation

* fix randomltd builder

* add comments to clarify create_op_builder and get_op_builder

* fix compatibility with pip install -e

Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2023-01-26 06:03:12 -08:00
Olatunji Ruwase 3f210c9715
CUDA optional deepspeed ops (#2507)
* CPU-Adam: add compile-flag to enable param-copy from CPU to GPU

* guarde the CUDA-related include files and variables

* remove CUDA dependency from op_builder when building against CPU

* fixing the builder issues

* fix formatting

* return true when there is no mismatch on the cuda version

* guard for when cuda is not available & test with cpu-only environment

* Update cpu_adam and cpu_adagrad

* Format fixes

* Add configurable half precision type; Build/run in CUDA environment

* Run cpu_adam and cpu_adagrad in cpu only environment

* Mark CUDA only unit tests

* CPU environment CI

* Format fixes

* Remove --forked

* Add --forked

* CPU only CI should pass

* Format fixes

* Format fixes

* Remove scattered pytest.skip

* Fix cpu_adam unit test

* Update .github/workflows/nv-torch-latest-cpu.yml

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* Update .github/workflows/nv-torch-latest-cpu.yml

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* Address PR feedback

* OpenMP linking

* Fix unit tests

Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2023-01-17 18:02:45 -05:00
LOK CHAND KOPPAKA aef8a8560c
Extend quantization utils features (#2683)
* Extend quantization utils features

* remove unwanted files

* fix cahce setting

Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
2023-01-12 19:25:55 +00:00
Ma, Guokai 9548d48f48
Abstract accelerator (step 2) (#2560)
* Abstract accelerator (step 2)

* more flex op_builder path for both installation and runtime

* add SpatialInferenceBuilder into cuda_accelerator.py

* use reflection to make cuda_accelerator adapt to CUDA op builder change automatically

* clean up deepspeed/__init__.py

* add comments in cuda_accelerator for no torch path

* Update deepspeed/env_report.py

Change env_report.py according to suggestion

Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

* reduce the range of try...except for better code clarity

* Add porting for deepspeed/ops/random_ltd/dropping_utils.py

* move accelerator to top directory and create symlink under deepspeed

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-01-06 23:40:58 -05:00
Connor Holmes a25c31b67d
Update AVX512 Detection (#2621)
* Update cpuinfo AVX512 detection

* Missing conversion from `_mm256` to `_mm256i`

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2022-12-17 05:57:28 -08:00
lokoppakmsft 3a3dfe66bb
Move layer norm to new schedule (#2590)
* Move layer norm to new schedule

* Pre-commit fixes

* fix comments

* format fixes

* Merge unrolls

* format fixes

* camelCase

* format fixes

* revert unwanted file

* move pow2 function

* format fixes

Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
2022-12-13 18:37:24 +00:00
Conglong Li ef869377e9
DeepSpeed Data Efficiency Library (#2585)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-12-12 16:55:18 -08:00
lokoppakmsft 591744eba3
Support N-dimension input in quantization kernel (#2575)
* Add support for inputs > 2D

* use vec

* Add N-Dim support to Dequant kernel

* merge master and fix format

* Bug Fix

* fix num_bits

* Fix dequant

Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
2022-12-07 18:33:20 -08:00
Reza Yazdani 35b350b28c
Fix quantized-inference & Add generic support of checkpoint loading (#2547)
* fix checkpoint loading when it is a dictionary

* fix some issues with saving ckpt & int8 inference

* fix quantized-inference & add generic support of checkpoint loading

* remove int8 hard-coded flag

* fix mlp return tensors

* fix several issue to load checkpoints of GPT-J, GPT-NEOX, and OPT with different TP-size

* add more comments & description for checkpoint-loading module

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2022-12-06 13:49:29 -08:00
Connor Holmes b841628207
Drop Maxwell Support (#2574)
* Officially drop Maxwell support

* Formatting

* Comparison mismatch fix
2022-12-06 10:42:32 -08:00
Connor Holmes 30c8d8a881
Initial dequant library implementation (#2521) 2022-11-18 16:02:30 -08:00
lokoppakmsft 78d4ca1f4b
Deepspeed quantization library v0.1 (#2450)
* Initial commit Deepspeed quantization library

* Match function signatures

* Add Quantization Kernel

* adding offset comparision and precommit changes

* format fixes

* FIle name changes

* pt_binding_changes

* test name change

* Integer quantization, minor refactors

* Add directed test_case

* format fixes

* Move param calculation to constructor of params class

* Use local function and add elemsPerBlock

* change function to be specalized

* sub block reduce

* add new schedule

* Add new schedule test case

* fix illegal writes in sch1

* Style fixes in comments

Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
2022-11-17 11:09:45 -08:00
Connor Holmes e7e7595502
Stable Diffusion Enhancements (#2491)
Co-authored-by: cmikeh2 <connorholmes@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
2022-11-09 17:40:59 -08:00
Reza Yazdani 9cfcf7431a
Add correct memory-allocation at DeepSpeed-Attention (#2474)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
2022-11-07 16:23:25 -08:00
Joe Mayer 4a06ecf631
Updating autotune json default in docs. (#2476)
* Updating autotune default in docs.

* Running pre-commit.
2022-11-04 16:00:13 -07:00
Connor Holmes 10e9d04c23
Cache Allocation and Softmax Fixes (#2433)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-11-02 10:48:18 -07:00
Connor Holmes be4ffb82ad
Reduction Kernel Utility (#2436)
* Initial reduction_utils.h implementation

* Add initialization helper, ensures correct min/max behavior

* Remove unnecessary warp sync
2022-10-27 20:34:26 +00:00
eltonzheng b85eb3b979
Fix build issues on Windows (#2428)
* Fix build issues on Windows

* small fix to complie with new version of Microsoft C++ Build Tools

Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-10-26 00:14:43 +00:00
Jeff Rasley ec13da6ba7
add SD injection policy (#2381)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-10-13 16:47:12 -07:00
Andrey Chernykh d5d10b0ce8
Fix issue with corrupted output on long generation for GPT (#2359)
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2022-10-13 09:53:48 -07:00
Connor Holmes c3001324b4
Add predicated global load (#2373)
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-10-11 19:41:23 +00:00
Arash Bakhtiari 0a2ae2ef45
Fix the MLP output tensor's shape (#2380) 2022-10-04 22:27:37 -07:00
Arash Bakhtiari e14d40e5f3
Refactor fused_bias_residual kernels for better readability (#2356)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2022-09-27 10:27:46 -07:00
Guanhua Wang 3486afb1a3
fix cuda invalid config error in dequant kernel (#2362)
* format

* remove round fn
2022-09-27 01:31:28 +00:00
Arash Bakhtiari 9df604bf51
Refactor gptj_residual_add kernels for better readability (#2358)
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-09-26 17:02:45 -04:00
Connor Holmes 9aa7b638b7
Kernel Data Conversion Utility (#2327)
* Unify macro definitions and constants in a single file

* Conversion utility implementation.

* Fix reversion from formatting

* Bugfixes after testing with correct DeepSpeed

* Inline markers are available on both HIP + CUDA
2022-09-26 07:25:08 -04:00
Connor Holmes 3d097bb865
Extend scratch buffer for long prompts (#2212)
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-09-22 18:04:36 -07:00
Guanhua Wang 954e0c61f1
mem access for quantize kernel (#2331)
* mem access for quantize kernel

* format

* format fp32

* modify quant kernel

* modify quant kernel2

* modify format

* format

* fix comments in pytest

* fix comments in pytest

* format

* rerun

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
2022-09-22 13:28:30 -07:00
Arash Bakhtiari 48c5220b52
Refactor residual add kernels (#2333)
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
2022-09-21 13:15:45 -07:00
Sam Ade Jacobs 12e1cb8262
MOE matmult with memaccess (#2336)
* Fix formatting

* Remove redundant variable
2022-09-21 09:17:15 -07:00
Sam Ade Jacobs 80b10d0c69
MOE residual matmult unit test (#2323)
MOE residual matmul unit tests

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
2022-09-19 15:00:32 -07:00
Michael Wyatt c199edac82
refactor to use mem_access (#2317) 2022-09-14 01:11:08 +00:00
Arash Bakhtiari efa8aded4a
Fix the residual add mp scaling for GPTNeoX (#2310) 2022-09-12 11:45:32 -07:00
Molly Smith d0dfe38d53
Update relu.cu with mem_access_utils (#2306) 2022-09-09 18:53:48 +00:00
Michael Wyatt b2d550ab85
Unit test for bias add kernel (#2298)
* added unit test

* Update pt_binding.cpp

* formatting

* Update test_bias_add.py
2022-09-09 09:31:39 -07:00
Reza Yazdani 47e030f54d
Fp32 accuracy bug fix (#2285)
Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org>
Co-authored-by: Arash Bakhtiari <arashb@users.noreply.github.com>
2022-09-02 17:13:41 -07:00
Connor Holmes c84bca37b1
Memory Access Utility (#2276)
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
2022-09-01 12:49:51 -07:00
Reza Yazdani afdc72879f
Ds-inference Int8 support through ZeroQuant technology (#2217)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-08-30 16:39:34 -07:00
Reza Yazdani d154cc0f55
Ds inference/fix mp2 (#2270) 2022-08-29 16:33:03 -07:00
Connor Holmes 2a64448830
Update half precision header guards (#2261) 2022-08-25 11:40:29 -04:00
Reza Yazdani fda63432ba
Remove the random-generator from context during inference (#2228)
* Fix the tensor-slicing copy for qkv parameters

* remove the random-generator from context during inference

* formatting

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-08-18 08:37:35 -07:00
Arash Bakhtiari 8b2a63717a
Add support of OPT models (#2205)
* add opt replace policy

* simplify inf. api

* fix opt replace policy

* fix use-cash & add relu

* Add support of custom MLP act. function

* Revert "simplify inf. api"

This reverts commit 9e910fcbd5471dec9b3c92008426f5ba590bf0b6.

* fix the inference API (temp. solution)

* fix code formatting

* add unit tests for OPT models.

* refactor pre-attention layer norm configuration

* add support of opt-350m model

* refactor the HF model config initialization

* fix hf model config issue

Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-08-15 07:31:51 -07:00
Ramya Ramineni 2e3769a1f4
Enable fused_lamb_cuda_kernel on ROCm (#2148)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2022-08-04 09:25:49 -07:00
Alex Hedges 316c4a43e0
Add flake8 to pre-commit checks (#2051) 2022-07-25 16:48:08 -07:00
Reza Yazdani aa88137b8d
Add Inference support for running the BigScience-BLOOM Architecture (#2083)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2022-07-18 16:27:12 -07:00
Reza Yazdani a04480e192
Fix the half-precision version of CPU-Adam (#2032)
* Fix the half-precision version of CPU-Adam

* remove unexpected return

* fix the increase width (fp32/fp16)

* support fp16 tests for cpu-adam

* fix the fp16 data-loading

* change unit-test for fp16 check & slight change to parameter size

* fix for numpy error

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2022-06-23 08:56:26 -07:00
Jeff Rasley b666d5cd73
[inference] test suite for ds-kernels (bert, roberta, gpt2, gpt-neo, gpt-j) (#1992)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-06-15 14:21:19 -07:00
Reza Yazdani 0ebd81dfa9
small fix for the HF Bert models (#1984) 2022-05-31 10:23:27 -07:00
Reza Yazdani 8164ea9e6d
Fixing several bugs in the inference-api and the kernels (#1951)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-05-24 13:27:50 -07:00
Ramya Ramineni 96c8bf32aa
Enable DeepSpeed inference on ROCm (#1922)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-04-28 10:08:05 -07:00
Jeff Rasley b4fcd98ff0
Inference PP changes for neox (#1899)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-04-26 11:50:38 -07:00
Ramya Ramineni b4e8f18c27
THCGeneral.h header file is deprecated (#1842)
Co-authored-by: hubertlu-tw <hubertlu@amd.com>
2022-03-16 16:36:08 -07:00
Ramya Ramineni 7bcb4fabeb
Enable CG headers on ROCm (#1821)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-03-11 12:06:41 -08:00
Jithun Nair 350d74ca39
Invoke hipify from op builder for JIT extension builds (#1807)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-03-07 18:59:14 +00:00
Jeff Rasley c3c8d5dd93
AMD support (#1430)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jithun Nair <jithun.nair@amd.com>
Co-authored-by: rraminen <rraminen@amd.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: okakarpa <okakarpa@amd.com>
Co-authored-by: rraminen <rraminen@amd.com>
Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: okakarpa <okakarpa@amd.com>
Co-authored-by: Ramya Ramineni <62723901+rraminen@users.noreply.github.com>
2022-03-03 01:53:35 +00:00
Alex Hedges 4cf970e6bb
Add codespell to pre-commit checks (#1717) 2022-01-22 14:45:58 -08:00
Jeff Rasley e46d808a1b
MoE inference + PR-MoE model support (#1705)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Zhewei Yao <zheweiy@berkeley.edu>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
2022-01-18 16:25:01 -08:00