DeepSpeed/csrc/random_ltd
Ramya Ramineni 76c9c69fb1
Rocm warp size fix (#5402)
This PR enables building the below extensions for AMD GPUs with warp
size 32.
- transformer_inference
- quantizer
- random_ltd


This PR works stand-alone for torch version <=2.0. For the latest
versions, https://github.com/microsoft/DeepSpeed/pull/5401 is required
to be merged in addition to this PR.

Unit test results (rocm/pytorch:rocm6.1_ubuntu20.04_py3.9_pytorch_2.1.2)
on NAVI3x:

**transformer_inference:**
pytest --color=yes --durations=0 --verbose -s -m "inference_ops" -rF -n
4 unit/ops/transformer/inference

Before this PR:
===== 674 failed, 622 skipped, 8 warnings, 1728 errors in 69.37s
(0:01:09) =====

After this PR:
========== 476 failed, 1062 passed, 1486 skipped, 8 warnings in 9.31s
==========

**quantizer:**
pytest --color=yes --durations=0 --verbose -s -m "inference_ops" -rF -n
4 unit/ops/quantizer

Before this PR:
     ==== 244 failed, 8 warnings in 30.53s ====

After this PR:
    ====== 186 failed, 58 passed, 8 warnings in 8.89s ======

I could not find random_ltd related unit tests to run.

Fixes: 
https://github.com/microsoft/DeepSpeed/issues/4753
https://github.com/microsoft/DeepSpeed/issues/5474
https://github.com/ROCm/DeepSpeed/issues/68

cc: @jithunnair-amd

---------

Co-authored-by: rraminen@amd.com <rraminen>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-05-17 20:35:58 +00:00
..
gather_scatter.cu Update DeepSpeed copyright license to Apache 2.0 (#3111) 2023-03-30 17:14:38 -07:00
pt_binding.cpp Update DeepSpeed copyright license to Apache 2.0 (#3111) 2023-03-30 17:14:38 -07:00
slice_attn_masks.cu Update DeepSpeed copyright license to Apache 2.0 (#3111) 2023-03-30 17:14:38 -07:00
token_sort.cu Rocm warp size fix (#5402) 2024-05-17 20:35:58 +00:00