DeepSpeed

Граф коммитов

Автор	SHA1	Сообщение	Дата
Yizhou Wang	11bbf45af5	[XPU] host timer check version from Torch 2.5 to Torch 2.6 (#6633 ) Elapsed time would be supported in Torch 2.6. Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>	2024-10-22 06:53:15 +00:00
YiSheng5	45b363504e	[XPU]Use host time to replace xpu time when IPEX version slower than 2.5. (#5796 ) Use the host time to replace xpu event elapsed_time as a WA, on XPU device, use XPU event to measure the time will be consolidated in ipex 2.5. Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2024-07-25 15:03:07 -07:00
Yizhou Wang	d254d75ef0	[XPU] support op builder from intel_extension_for_pytorch kernel path (#5425 ) #Motivation From our next release, xpu DeepSpeed related kernels would be put into intel_extension_for_pytorch. This PR is to add new op builders and use kernel path from intel_extension_for_pytorch. More ops like MOE and WOQ will be added. --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>	2024-06-20 16:33:25 -07:00
Liangliang-Ma	8831b57bb2	fix IDEX dependence in xpu accelerator (#5666 ) We don't use IDEX in xpu accelerator from Deepspeed. Fix this hardcode.	2024-06-17 13:39:10 -07:00
Yizhou Wang	73316307b1	[XPU] adapt lazy_call func to different versions (#5670 ) Previously, lay_call function was wrapped by torch.xpu.lay_init._lazy_call, which is now changed to torch.xpu._lazy_call. Thus we change this function to adapt different versions.	2024-06-17 13:24:22 -07:00
Polisetty V R K Jyothendra Varma	ac935c7fde	assumption of torch.initial_seed function accepting seed arg in DeepSpeedAccelerator abstract class is incorrect (#5569 ) pytorch API reference - https://pytorch.org/docs/stable/generated/torch.initial_seed.html fix return value of manual_seed api for hpu --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>	2024-06-12 09:32:17 -07:00
Liangliang-Ma	11a62a0635	Add Compressedbackend for Onebit optimizers (#5473 ) In the process of adding onebit optimizers support for XPU devices, we have noticed that for different accelerator, the main difference of implementation of `compressed_allreduce` lies on `packbits` and `unpackbits`. CUDA uses cupy and NPU uses torch_npu. Instead of replace these to xpu only functions, we provided a CompressedBackend to do the `compressed_allreduce` work where users can add their own packbits/unpackbits kernels, which is a general path for all kinds of accelerators. In this PR, we: 1. Add CompressedBackend for onebitAdam, onebitLamb and zerooneAdam 2. Add XPU implement of packbits/unpackbits with SYCL, built in PackbitsBuilder 3. Add tests for onebit with CompressedBackend --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>	2024-06-05 20:28:46 +00:00
shiyang-weng	1d8196736f	Fix the TypeError for XPU Accelerator (#5531 ) Fixing following error /datadisk2/wengshiy/llm.devkit/DeepSpeed/deepspeed/runtime/utils.py return get_accelerator().FloatTensor(float(v)).detach() TypeError: new(): data must be a sequence (got float) cuda accelerator modified the interface for fixing warning: `177dc14331` --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>	2024-05-20 14:52:44 +00:00
vikram singh shekhawat	fa8458b1a8	Add getter and setter methods for compile_backend across accelerators. (#5299 ) Add getter and setter methods for `compile_backend` across accelerators, which provide a mechanism to retrieve the compile backend. These APIs handle user-defined backend selection and raise a `ValueError` with informative error messages for unsupported backends. --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2024-04-24 15:25:18 +00:00
shiyuan680	3f875d9519	add device config env for the accelerator (#5396 ) Thank you for [pr](https://github.com/microsoft/DeepSpeed/pull/5369) and @delock contribution of ideas. As mentioned in this [pr](https://github.com/microsoft/DeepSpeed/pull/5369), each device has its own environmental variables. We create visible_devices_envs() and set_visible_devices_envs() methods on the accelerator class to enable each accelerator to implement env settings within the interface , which is more generic to other accelerators. this commit has tested on npu, each one has 8 ascend npus --------- Co-authored-by: yangcheng <yangcheng104@huawei.com> Co-authored-by: eigen2017 <wobushiliu2@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>	2024-04-20 23:35:50 +00:00
BacharL	697f945a05	Split is_synchronized_device api to multiple apis (#5026 ) Deepspeed currently calls is_synchronized_device() to decide how to use the device. HPU does not fit into this definition since it behaves like all streams are blocking streams, meaning they preserve order between each other but asynchronous to CPU. see cudaStreamCreateWithFlags. has_data_dependency_resolving() HPU device is considered synchronized wrt CPU. Operations executed in the script order regardless of stream they were enqueued on. Tensor data is guaranteed to be valid. No need to stream dependencies or CPU synchronizations. use_host_timers() HPU device execution is async. To measure device execution time we must use device timers. has_memory_backpressure() limiting number of inflight fetched params and number of inflight grad reduce_scatter calls is not necessary since HPU will stop enqueuing calls if memory is full, creating internal backpressure for the CPU until memory is available. --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2024-02-08 16:03:56 +00:00
Ma, Guokai	f4f31317ed	[XPU] XPU accelerator support for Intel GPU device (#4547 ) This PR includes XPU support for Intel GPU. With this PR, DeepSpeed can support XPU devices without install Intel Extension for DeepSpeed. --------- Co-authored-by: Liangliang-Ma <1906710196@qq.com> Co-authored-by: baodi <di.bao@intel.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Yizhou Wang <yizhou.wang@intel.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>	2024-01-05 12:29:07 -08:00

12 Коммитов