DeepSpeed/csrc/transformer/inference
Jagadish Krishnamoorthy 2b41d6212c
[Bug Fix] Support threads_per_head < 64 for wavefront size of 64 (#6622)
When launching apply_rotary_pos_half kernel, only threads_per_head of 64
is supported for wavefront size of 64.
This change adds support for threads_per_head < 64 such as 4, 8, 16.

Fixes the issue introduced in
https://github.com/microsoft/DeepSpeed/pull/5402

---------

Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Logan Adams <loadams@microsoft.com>
2024-11-04 21:51:27 +00:00
..
csrc [Bug Fix] Support threads_per_head < 64 for wavefront size of 64 (#6622) 2024-11-04 21:51:27 +00:00
includes