DeepSpeed/deepspeed/comm
Heyang Qin c37fe9cbfb
Fix exception handling in get_all_ranks_from_group() function (#4862)
In the latest Pytorch nightly, the exception thrown from
`torch.distributed.distributed_c10d.get_global_rank()` is changed from
`RuntimeError` to `ValueError` so we need to update our try-catch in
`deepspeed.comm`

Tested with torch version 2.3.0.dev20231221+cu121

Fixes: https://github.com/microsoft/DeepSpeed/issues/4853
2023-12-22 11:48:48 -08:00
..
__init__.py Fix for dist not being initialized when constructing main config (#3324) 2023-04-20 16:55:12 -07:00
backend.py Correct world_size/backend for mpi (#3694) 2023-06-06 23:36:25 +00:00
ccl.py fix error type in ccl.py (#4521) 2023-10-18 17:04:57 +00:00
comm.py Fix exception handling in get_all_ranks_from_group() function (#4862) 2023-12-22 11:48:48 -08:00
config.py Introduce pydantic_v1 compatibility module for pydantic>=2.0.0 support (#4407) 2023-10-09 11:59:30 -07:00
constants.py Make Ascend NPU available (#3831) 2023-07-22 12:52:27 +00:00
reduce_op.py [CPU] Support Intel CPU inference (#3041) 2023-05-16 11:59:22 -04:00
torch.py CI fix for torch 2.1 release (#4452) 2023-10-05 15:31:24 -07:00
utils.py [profiling][mics]Fix some issues for log_summary(). (#3899) 2023-07-07 19:25:15 +00:00