DeepSpeed/csrc/transformer
Jagadish Krishnamoorthy 2b41d6212c
[Bug Fix] Support threads_per_head < 64 for wavefront size of 64 (#6622)
When launching apply_rotary_pos_half kernel, only threads_per_head of 64
is supported for wavefront size of 64.
This change adds support for threads_per_head < 64 such as 4, 8, 16.

Fixes the issue introduced in
https://github.com/microsoft/DeepSpeed/pull/5402

---------

Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Logan Adams <loadams@microsoft.com>
2024-11-04 21:51:27 +00:00
..
inference [Bug Fix] Support threads_per_head < 64 for wavefront size of 64 (#6622) 2024-11-04 21:51:27 +00:00
cublas_wrappers.cu rocblas -> hipblas changes for ROCm (#5401) 2024-05-17 01:57:00 +00:00
dropout_kernels.cu DeepSpeed Chat (#3186) 2023-04-11 11:53:38 -07:00
ds_transformer_cuda.cpp Switch from HIP_PLATFORM_HCC to HIP_PLATFORM_AMD (#4539) 2023-10-19 21:01:48 +00:00
gelu_kernels.cu Update DeepSpeed copyright license to Apache 2.0 (#3111) 2023-03-30 17:14:38 -07:00
general_kernels.cu Fix formatting (#3343) 2023-04-21 09:57:46 -07:00
normalize_kernels.cu Update DeepSpeed copyright license to Apache 2.0 (#3111) 2023-03-30 17:14:38 -07:00
softmax_kernels.cu Update DeepSpeed copyright license to Apache 2.0 (#3111) 2023-03-30 17:14:38 -07:00
transform_kernels.cu Update DeepSpeed copyright license to Apache 2.0 (#3111) 2023-03-30 17:14:38 -07:00