DeepSpeed

История

Ramya Ramineni 76c9c69fb1 Rocm warp size fix (#5402 ) This PR enables building the below extensions for AMD GPUs with warp size 32. - transformer_inference - quantizer - random_ltd This PR works stand-alone for torch version <=2.0. For the latest versions, https://github.com/microsoft/DeepSpeed/pull/5401 is required to be merged in addition to this PR. Unit test results (rocm/pytorch:rocm6.1_ubuntu20.04_py3.9_pytorch_2.1.2) on NAVI3x: transformer_inference: pytest --color=yes --durations=0 --verbose -s -m "inference_ops" -rF -n 4 unit/ops/transformer/inference Before this PR: ===== 674 failed, 622 skipped, 8 warnings, 1728 errors in 69.37s (0:01:09) ===== After this PR: ========== 476 failed, 1062 passed, 1486 skipped, 8 warnings in 9.31s ========== quantizer: pytest --color=yes --durations=0 --verbose -s -m "inference_ops" -rF -n 4 unit/ops/quantizer Before this PR: ==== 244 failed, 8 warnings in 30.53s ==== After this PR: ====== 186 failed, 58 passed, 8 warnings in 8.89s ====== I could not find random_ltd related unit tests to run. Fixes: https://github.com/microsoft/DeepSpeed/issues/4753 https://github.com/microsoft/DeepSpeed/issues/5474 https://github.com/ROCm/DeepSpeed/issues/68 cc: @jithunnair-amd --------- Co-authored-by: rraminen@amd.com <rraminen> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>		2024-05-17 20:35:58 +00:00
..
gather_scatter.cu	Update DeepSpeed copyright license to Apache 2.0 (#3111 )	2023-03-30 17:14:38 -07:00
pt_binding.cpp	Update DeepSpeed copyright license to Apache 2.0 (#3111 )	2023-03-30 17:14:38 -07:00
slice_attn_masks.cu	Update DeepSpeed copyright license to Apache 2.0 (#3111 )	2023-03-30 17:14:38 -07:00
token_sort.cu	Rocm warp size fix (#5402 )	2024-05-17 20:35:58 +00:00