DeepSpeed/csrc
Joe Mayer e2654bfd1a
Fix Type Mismatch (#6410)
`num_bytes_per_thread` was a smaller type than `file_num_bytes`, this
caused issues when dividing by `num_threads`.

Co-authored-by: jomayeri <deepspeed@H100-VM2.shlnn55tgwve1eacvp21ie45dg.jx.internal.cloudapp.net>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-08-23 23:17:38 +00:00
..
adagrad CPUAdam fp16 and bf16 support (#5409) 2024-05-20 12:50:20 +00:00
adam CPUAdam fp16 and bf16 support (#5409) 2024-05-20 12:50:20 +00:00
aio Fix Type Mismatch (#6410) 2024-08-23 23:17:38 +00:00
cpu [CPU] Allow deepspeed.comm.inference_all_reduce in torch.compile graph (#5604) 2024-07-15 22:24:11 +00:00
deepspeed4science/evoformer_attn Update clang-format version from 16 to 18. (#5839) 2024-08-06 09:14:21 -07:00
fp_quantizer Add fp8-fused gemm kernel (#5764) 2024-07-29 11:07:00 -07:00
gds DeepNVMe GDS (#5852) 2024-08-19 04:28:50 +00:00
includes Update clang-format version from 16 to 18. (#5839) 2024-08-06 09:14:21 -07:00
lamb Switch from HIP_PLATFORM_HCC to HIP_PLATFORM_AMD (#4539) 2023-10-19 21:01:48 +00:00
lion CPUAdam fp16 and bf16 support (#5409) 2024-05-20 12:50:20 +00:00
quantization Fixed the Windows build. (#5596) 2024-05-31 22:11:10 +00:00
random_ltd Rocm warp size fix (#5402) 2024-05-17 20:35:58 +00:00
sparse_attention
spatial Switch from HIP_PLATFORM_HCC to HIP_PLATFORM_AMD (#4539) 2023-10-19 21:01:48 +00:00
transformer Fixed Windows inference build. (#5609) 2024-06-24 13:39:18 -07:00
utils
xpu Update clang-format version from 16 to 18. (#5839) 2024-08-06 09:14:21 -07:00