Граф коммитов

12 Коммитов

Автор SHA1 Сообщение Дата
Yizhou Wang 11bbf45af5
[XPU] host timer check version from Torch 2.5 to Torch 2.6 (#6633)
Elapsed time would be supported in Torch 2.6.

Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
2024-10-22 06:53:15 +00:00
YiSheng5 45b363504e
[XPU]Use host time to replace xpu time when IPEX version slower than 2.5. (#5796)
Use the host time to replace xpu event elapsed_time as a WA, on XPU
device, use XPU event to measure the time will be consolidated in ipex
2.5.

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-07-25 15:03:07 -07:00
Yizhou Wang d254d75ef0
[XPU] support op builder from intel_extension_for_pytorch kernel path (#5425)
#Motivation
From our next release, xpu DeepSpeed related kernels would be put into
intel_extension_for_pytorch. This PR is to add new op builders and use
kernel path from intel_extension_for_pytorch. More ops like MOE and WOQ
will be added.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-06-20 16:33:25 -07:00
Liangliang-Ma 8831b57bb2
fix IDEX dependence in xpu accelerator (#5666)
We don't use IDEX in xpu accelerator from Deepspeed.
Fix this hardcode.
2024-06-17 13:39:10 -07:00
Yizhou Wang 73316307b1
[XPU] adapt lazy_call func to different versions (#5670)
Previously, lay_call function was wrapped by
torch.xpu.lay_init._lazy_call, which is now changed to
torch.xpu._lazy_call.

Thus we change this function to adapt different versions.
2024-06-17 13:24:22 -07:00
Polisetty V R K Jyothendra Varma ac935c7fde
assumption of torch.initial_seed function accepting seed arg in DeepSpeedAccelerator abstract class is incorrect (#5569)
pytorch API reference -
https://pytorch.org/docs/stable/generated/torch.initial_seed.html
fix return value of manual_seed api for hpu

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-06-12 09:32:17 -07:00
Liangliang-Ma 11a62a0635
Add Compressedbackend for Onebit optimizers (#5473)
In the process of adding onebit optimizers support for XPU devices, we
have noticed that for different accelerator, the main difference of
implementation of `compressed_allreduce` lies on `packbits` and
`unpackbits`. CUDA uses cupy and NPU uses torch_npu. Instead of replace
these to xpu only functions, we provided a CompressedBackend to do the
`compressed_allreduce` work where users can add their own
packbits/unpackbits kernels, which is a general path for all kinds of
accelerators.

In this PR, we:
1. Add CompressedBackend for onebitAdam, onebitLamb and zerooneAdam
2. Add XPU implement of packbits/unpackbits with SYCL, built in
PackbitsBuilder
3. Add tests for onebit with CompressedBackend

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-06-05 20:28:46 +00:00
shiyang-weng 1d8196736f
Fix the TypeError for XPU Accelerator (#5531)
Fixing following error
/datadisk2/wengshiy/llm.devkit/DeepSpeed/deepspeed/runtime/utils.py
    return get_accelerator().FloatTensor(float(v)).detach()
TypeError: new(): data must be a sequence (got float)

cuda accelerator modified the interface for fixing warning:
177dc14331

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-05-20 14:52:44 +00:00
vikram singh shekhawat fa8458b1a8
Add getter and setter methods for compile_backend across accelerators. (#5299)
Add getter and setter methods for `compile_backend` across accelerators,
which provide a mechanism to retrieve the compile backend. These APIs
handle user-defined backend selection and raise a `ValueError` with
informative error messages for unsupported backends.

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-04-24 15:25:18 +00:00
shiyuan680 3f875d9519
add device config env for the accelerator (#5396)
Thank you for [pr](https://github.com/microsoft/DeepSpeed/pull/5369) and
@delock contribution of ideas.
As mentioned in this
[pr](https://github.com/microsoft/DeepSpeed/pull/5369), each device has
its own environmental variables.
We create visible_devices_envs() and set_visible_devices_envs() methods
on the accelerator class to enable each accelerator to implement env
settings within the interface , which is more generic to other
accelerators.

this commit has tested on npu, each one has 8 ascend npus

---------

Co-authored-by: yangcheng <yangcheng104@huawei.com>
Co-authored-by: eigen2017 <wobushiliu2@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-04-20 23:35:50 +00:00
BacharL 697f945a05
Split is_synchronized_device api to multiple apis (#5026)
Deepspeed currently calls is_synchronized_device() to decide how to use
the device.
HPU does not fit into this definition since it behaves like all streams
are blocking streams,
meaning they preserve order between each other but asynchronous to CPU. 
see cudaStreamCreateWithFlags. 

**has_data_dependency_resolving()**
HPU device is considered synchronized wrt CPU. Operations executed in
the script order
regardless of stream they were enqueued on. Tensor data is guaranteed to
be valid.
No need to stream dependencies or CPU synchronizations.

**use_host_timers()**
HPU device execution is async. To measure device execution time we must
use device timers.

**has_memory_backpressure()**
limiting number of inflight fetched params and number of inflight grad
reduce_scatter calls
is not necessary since HPU will stop enqueuing calls if memory is full,
creating internal
backpressure for the CPU until memory is available.

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-02-08 16:03:56 +00:00
Ma, Guokai f4f31317ed
[XPU] XPU accelerator support for Intel GPU device (#4547)
This PR includes XPU support for Intel GPU. With this PR, DeepSpeed can
support XPU devices without install Intel Extension for DeepSpeed.

---------

Co-authored-by: Liangliang-Ma <1906710196@qq.com>
Co-authored-by: baodi <di.bao@intel.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Yizhou Wang <yizhou.wang@intel.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2024-01-05 12:29:07 -08:00