зеркало из https://github.com/microsoft/DeepSpeed.git
76c9c69fb1
This PR enables building the below extensions for AMD GPUs with warp size 32. - transformer_inference - quantizer - random_ltd This PR works stand-alone for torch version <=2.0. For the latest versions, https://github.com/microsoft/DeepSpeed/pull/5401 is required to be merged in addition to this PR. Unit test results (rocm/pytorch:rocm6.1_ubuntu20.04_py3.9_pytorch_2.1.2) on NAVI3x: **transformer_inference:** pytest --color=yes --durations=0 --verbose -s -m "inference_ops" -rF -n 4 unit/ops/transformer/inference Before this PR: ===== 674 failed, 622 skipped, 8 warnings, 1728 errors in 69.37s (0:01:09) ===== After this PR: ========== 476 failed, 1062 passed, 1486 skipped, 8 warnings in 9.31s ========== **quantizer:** pytest --color=yes --durations=0 --verbose -s -m "inference_ops" -rF -n 4 unit/ops/quantizer Before this PR: ==== 244 failed, 8 warnings in 30.53s ==== After this PR: ====== 186 failed, 58 passed, 8 warnings in 8.89s ====== I could not find random_ltd related unit tests to run. Fixes: https://github.com/microsoft/DeepSpeed/issues/4753 https://github.com/microsoft/DeepSpeed/issues/5474 https://github.com/ROCm/DeepSpeed/issues/68 cc: @jithunnair-amd --------- Co-authored-by: rraminen@amd.com <rraminen> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> |
||
---|---|---|
.. | ||
gather_scatter.cu | ||
pt_binding.cpp | ||
slice_attn_masks.cu | ||
token_sort.cu |