Граф коммитов

2317 Коммитов

Автор SHA1 Сообщение Дата
Costin Eseanu e7dd28a23d
Fixed the Windows build. (#5596)
Fixed the Windows build.

Fixes applied:
- Remove some more ops that don't build on Windows.
- Remove the use of symlinks that didn't work correctly and replace with
`shutil.copytree()`.
- Small fixes to make the C++ code compile.

Tested with Python 3.9 and CUDA 12.1.

---------

Co-authored-by: Costin Eseanu <costineseanu@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-05-31 22:11:10 +00:00
Masahiro Tanaka 77c949421e
Add slide deck for meetup in Japan (#5598) 2024-05-31 14:05:19 -07:00
Abhishek Kulkarni 1baf68840f
Update minor CUDA version compatibility (#5591)
Add CUDA versions 12.4 and 12.5 to the list
2024-05-31 16:47:35 +00:00
Nadav Elyahu 2fc702ed9f
DeepSpeedCheckpoint: support custom final ln idx (#5506)
till today only last layer (idx=-1) was considered using
FINAL_LAYER_NORM_INDEX which is set to -1.
this PR allows the user to pass custom value for model where this
default value does not apply.
see example for usage in HabanaAI/Megatron-DeepSpeed fork repository:

c9feb8caca/tools/verify_checkpoint_non_tp_consistency.py (L296)

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <loadams@microsoft.com>
2024-05-28 23:42:59 +00:00
Logan Adams 4deb40de67
Update to fix sidebar over text (#5567)
- [x] Needs to be tested.

Fixes #5494.

Sample screenshot:
<img width="1141" alt="image"
src="https://github.com/microsoft/DeepSpeed/assets/114770087/f89f642b-bca1-4d45-b3f1-ec7943ab2ad4">
2024-05-28 15:42:05 -07:00
Logan Adams bf66acdbae
Rename files in fp_quantize op from quantize.* to fp_quantize.* (#5577)
Fixes #5535.

Todo: need to test.
2024-05-28 20:01:48 +00:00
YiSheng5 fd8051a69c
[MiCS] Remove the handle print on DeepSpeed side (#5574)
When running for MiCS, we found many handle print on DeepSpeed from the
output log, this pr is to remove it to suppress this.
2024-05-28 17:08:42 +00:00
Logan Adams 988372b7bf
Update HPU docker version (#5566) 2024-05-28 16:09:38 +00:00
Lev Kurilenko 3a3a6db333
Add hybrid_engine.py as path to trigger the DS-Chat GH workflow (#5562)
This PR updates the `nv-ds-chat` GitHub workflow to include
`hybrid_engine.py` file in the path. This is done to ensure testing on
the DS-Chat flow is done whenever any changes are made to the Hybrid
Engine.
2024-05-23 15:21:02 +00:00
Kun Chen f86824be81
Add Ulysses DistributedAttention compatibility (#5525)
The `DistributedAttention` in DeepSpeed-Ulysses has a compatibility with
the training code in
[Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed/blob/main/megatron/model/transformer.py#L811)
because it only takes sequential sequences as input parameters. However,
this is not compatible with the frequently used scenarios of specifying
parameters, such as the following scenario when using Flash Attention:
```python
ulysses_attn = DistributedAttention(local_attention=flash_attn_func, sequence_process_group=None, scatter_idx=2, gather_idx=1)

attn_output = ulysses_attn(
    query_states,
    key_states,
    value_states,
    dropout,
    softmax_scale,
    causal=causal,
)

```
Therefore, the `**kwargs` parameter has been added to increase
compatibility with more local attention, while making minimal code
modifications.

Co-authored-by: Kwen-Chen <2133949025@qq.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-05-22 21:52:39 +00:00
Max Kovalenko 995ba11928
Add throughput timer configuration (#5363)
The new "timers" section describes configuration for different timers.

Specifically, in the "throughput" section, it is possible to disable the
throughput timer (enabled by default). This allows to avoid the
performance degradation whenever the throughput measurement is not
needed, for example in production environment.

No device synchronize() is invoked when "synchronized" is set to False
(default is True). This allows to produce approximate throughput
measurements with minimal performance penalty.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-05-22 20:28:02 +00:00
Omar Elayan f4efef21b8
[INF] DSAttention allow input_mask to have false as value (#5546)
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-05-22 20:22:53 +00:00
Logan Adams 263bfe2892
Update to HF_HOME from TRANSFORMERS_CACHE (#4816)
Addresses the following warning:

```
/tmp/actions-runner/_work/DeepSpeed/DeepSpeed/unit-test-venv/lib/python3.8/site-packages/transformers/utils/hub.py:123: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
```

and the code on the transformers side is
[here](1a585c1222/src/transformers/utils/hub.py (L86C1-L96C81)).
2024-05-22 16:08:51 +00:00
Yichen Yan 29903925cf
Adapt doc for #4405 (#5552)
ditto

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-05-21 21:58:47 +00:00
Zihan Zhao 975290ae65
Small typos in functions set_none_gradients_to_zero (#5557)
change from "zero_like" to "zeros_like"
2024-05-21 21:37:27 +00:00
Max Kovalenko 5b314f4e6b
Avoid overwrite of compiled module wrapper attributes (#5549)
**Fix overwriting of the compiled wrapper class attributes by those of
the wrapped class itself: Copy only those attributes which are not
already present in the wrapper.**

In the current implementation of the `CompiledModuleWrapper` the wrapper
attributes (eg `forward` method) are overwritten by `self._dict_ =
module._dict_.copy()`:

```
def CompiledModuleWrapper(mod, compile_config: Union[CompileConfig, None] = None):
     class wrapper(mod.__class__):
         def __init__(self, module, compile_config: Union[CompileConfig, None] = None):
             self.__dict__ = module.__dict__.copy()
```
This causes the `wrapper`'s `forward` method not being called and,
consequently, the wrapped module not compiled. Instead, the wrapped
module `forward` method is being called as illustrated in the diagram
below (a real scenario from Deespeed-Chat):


![compiled_module_wrapper_bug](https://github.com/microsoft/DeepSpeed/assets/75629718/00eeb3d1-927c-49c7-84ab-f882821cc452)

The proposed fix copies only those attributes which are not present in
the wrapper class, thus implementing the desired inheritance quality of
the wrapper.

Attached is a simple reproducer of the problem.

[compiled_module_wrapper_bug.zip](https://github.com/microsoft/DeepSpeed/files/15378282/compiled_module_wrapper_bug.zip)

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-05-21 17:17:06 +00:00
Liran Bachar 0a1740386f
Remove synchronize calls from allgather params (#5516)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-05-21 15:01:20 +00:00
shiyang-weng 695d79ea06
Fix RuntimeError for moe on XPU: tensors found at least two devices (#5519)
There is following error on XPU while unit testing
"DeepSpeed/tests/unit/moe/test_moe.py"
DeepSpeed/deepspeed/moe/sharded_moe.py line 223, in top1gating
RuntimeError: Expected all tensors to be on the same device, but found
at least two devices, xpu:0 and cpu!

Fix it by device conversion.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-05-21 15:01:05 +00:00
shiyang-weng 1d8196736f
Fix the TypeError for XPU Accelerator (#5531)
Fixing following error
/datadisk2/wengshiy/llm.devkit/DeepSpeed/deepspeed/runtime/utils.py
    return get_accelerator().FloatTensor(float(v)).detach()
TypeError: new(): data must be a sequence (got float)

cuda accelerator modified the interface for fixing warning:
177dc14331

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-05-20 14:52:44 +00:00
Liran Bachar 69af361167
CPUAdam fp16 and bf16 support (#5409)
Hi.
Please review the following changes
I added support for BF16 to cpu adam. BF16, FP16 and float are supported
at compilation time. the correct template is called at runtime according
to input params dtype.

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-05-20 12:50:20 +00:00
Max Kovalenko 49df8d8da0
Optimize zero3 fetch params using all_reduce (#5420)
* Use all_reduce instead of all_gather to fetch module parameters. This
improves performance by reducing the overhead of concatenation and
slicing, which are no longer required.
* Instead, all tensors views are created prior to the collective
(all_reduce), so upon its completion only the parameter status is
updated.
* The behavior is enabled via a new boolean flag under the section
"zero_optimization": { "stage3_use_all_reduce_for_fetch_params": true }
* By default the optimization is not enabled.

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-05-20 12:48:56 +00:00
Ramya Ramineni 76c9c69fb1
Rocm warp size fix (#5402)
This PR enables building the below extensions for AMD GPUs with warp
size 32.
- transformer_inference
- quantizer
- random_ltd


This PR works stand-alone for torch version <=2.0. For the latest
versions, https://github.com/microsoft/DeepSpeed/pull/5401 is required
to be merged in addition to this PR.

Unit test results (rocm/pytorch:rocm6.1_ubuntu20.04_py3.9_pytorch_2.1.2)
on NAVI3x:

**transformer_inference:**
pytest --color=yes --durations=0 --verbose -s -m "inference_ops" -rF -n
4 unit/ops/transformer/inference

Before this PR:
===== 674 failed, 622 skipped, 8 warnings, 1728 errors in 69.37s
(0:01:09) =====

After this PR:
========== 476 failed, 1062 passed, 1486 skipped, 8 warnings in 9.31s
==========

**quantizer:**
pytest --color=yes --durations=0 --verbose -s -m "inference_ops" -rF -n
4 unit/ops/quantizer

Before this PR:
     ==== 244 failed, 8 warnings in 30.53s ====

After this PR:
    ====== 186 failed, 58 passed, 8 warnings in 8.89s ======

I could not find random_ltd related unit tests to run.

Fixes: 
https://github.com/microsoft/DeepSpeed/issues/4753
https://github.com/microsoft/DeepSpeed/issues/5474
https://github.com/ROCm/DeepSpeed/issues/68

cc: @jithunnair-amd

---------

Co-authored-by: rraminen@amd.com <rraminen>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-05-17 20:35:58 +00:00
Ramya Ramineni d3dd8e7454
rocblas -> hipblas changes for ROCm (#5401)
Fixes https://github.com/microsoft/DeepSpeed/issues/4989

In addition to this PR, below changes are required to build below
extensions successfully. Please note that not all unit tests for these
extensions will pass with this PR. More details on the unit test results
are below. These unit tests are skipped in CI anyway, so they will not
break the CI.
- transformer_inference
- quantizer
- random_ltd

- https://github.com/pytorch/pytorch/pull/121030
- https://github.com/microsoft/DeepSpeed/pull/5402


Unit test results (rocm/pytorch:rocm6.1_ubuntu20.04_py3.9_pytorch_2.1.2)
on MI200:

**transformer_inference:**
pytest --color=yes --durations=0 --verbose -s -m "inference_ops" -rF -n
4 unit/ops/transformer/inference

Before this PR: 
==== 674 failed, 622 skipped, 8 warnings, 1728 errors in 123.66s
(0:02:03) =====

After this PR:
========== 555 failed, 983 passed, 1486 skipped, 8 warnings in 14.35s
==========

**quantizer:**
pytest --color=yes --durations=0 --verbose -s -m "inference_ops" -rF -n
4 unit/ops/quantizer

Before this PR: 
==== 244 failed, 8 warnings in 48.02s ====

After this PR:
===== 187 failed, 57 passed, 8 warnings in 14.74s ====

I could not find random_ltd related unit tests to run.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Logan Adams <loadams@microsoft.com>
2024-05-17 01:57:00 +00:00
Zixu Wang 8e4f6e48db
Skip the UT cases that use unimplemented op builders. (#5372)
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Logan Adams <loadams@microsoft.com>
2024-05-16 17:46:52 +00:00
vikram singh shekhawat 7f55b20f3e
Enhance testing: Skip fused_optimizer tests if not supported. (#5159)
Enhance testing: Skip fused_optimizer tests if not supported.

Added condition check to skip fused_optimizer tests if FusedAdam and
FusedLamb are not supported by the accelerator. This enhancement ensures
that the tests are appropriately skipped when the hardware configuration
does not support these optimizers, preventing potential issues.

Details:
- Introduced a condition check to determine support for FusedAdam and
FusedLamb.
- If not supported, fused_optimizer tests are skipped to improve test
reliability.
- Improved compatibility and stability across different hardware
configurations.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-05-16 00:34:25 +00:00
Nadav Elyahu 23173faa4b
Improve _configure_optimizer() final optimizer log (#5528)
Was providing the optimizer name which was configured, and not optimizer
that was actually taking place after this function processing.
This is not always aligned.

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-05-15 18:06:36 +00:00
Aliaksandr Kuzmik 488a823f64
New integration - CometMonitor (#5466)
This PR introduces a new monitoring option - `CometMonitor` which comes
up as an official integration with
[CometML](https://www.comet.com/site/).

The new monitor is covered with unit tests.

Notes:
* We've updated `docs/code-docs/source/monitor.rst` but it doesn't look
used anymore
* We've updated the "Monitoring Module" section name in `config-json.md`
to be generic so the next integration won't require updating it.

---------

Co-authored-by: Boris Feld <lothiraldan@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-05-15 16:04:44 +00:00
YiSheng5 ebf82e8f3a
[manifest] update mainfest to add hpp file in deepspeed. (#5533)
Hi @loadams, Could you please help to review this pr?
After add hpp files in csrc, we found sometimes the hpp headers will
still be excluded from the op src packaging, so we add the hpp file in
deepspeed to make sure the hpp header in deepspeed package, to ensure
jit load to compile the xpu/fused_adam ops in 0.14.2.
2024-05-14 16:28:18 +00:00
Logan Adams 62ca317829
Switch from double quotes to match single quotes (#5530) 2024-05-13 20:20:21 -07:00
Logan Adams 82ce4ae815
Switch pynvml to nvidia-ml-py (#5529)
Fixes: #5517 

Link to PyPI for nvidia-ml-py
[here](https://pypi.org/project/nvidia-ml-py/) showing usage remaining
the same as previous pynvml package.
2024-05-13 23:45:50 +00:00
Yejing-Lai 3a7f3aa849
enable phi2 autotp (#5436)
This PR aims to enable phi2 model autotp.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-05-13 20:10:53 +00:00
YiSheng5 4696afd27b
[manifest] update mainfest to add hpp file in csrc. (#5522)
Update the mainfest to cover hpp file in csrc.
2024-05-13 18:26:11 +00:00
BacharL df4ef0ab69
Fused adam for HPU (#5500)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-05-10 20:53:55 +00:00
Yejing-Lai 3dd7ccff81
enable phi3_mini autotp (#5501)
This PR aims to enable phi3 mini autotp.

Phi3 mini uses chunk MLP. We adjust this linear layer weight order to
support this model.

Please kindly review~ Thanks!

---------

Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com>
2024-05-08 22:04:02 +00:00
BacharL 0b224edcf7
Fix compile wrapper (#5455)
compile wrapper will inherit from user module class and copy it's
__dict__

This should resolve most issues in #5383 except potential extra user
forward hooks.

@tohtana @loadams

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
2024-05-08 09:53:25 +00:00
harygo2 0fc19b6a32
Fix crash when creating Torch tensor on NPU with device=get_accelerator().current_device() (#5464)
Creating a Torch tensor with the parameter
`device=get_accelerator().current_device()` can result in a crash when
using an NPU.

This issue arises because the `current_device` API across all
accelerators is expected to return a device id as an integer, according
to the [interface
docs.](fa8458b1a8/docs/_tutorials/accelerator-abstraction-interface.md?plain=1#L52C1-L56C103)

However, specifying `device` as an interger when creating tensors by
default directs Torch to use the CUDA backend, which leads to crash on
NPUs (and potentially other accelerators as well).

To resolve this, we should use `get_accelerator().current_device_name()`
instead, which returns the correct device identifier strings such as
`"npu:0", "cuda:0", or "xpu:0"`. This API provides the appropriate
context needed for creating tensors on specific hardware accelerators.

I also notice that `device=get_accelerator().current_device()` is used
across several files under deepspeed/inference, and may also lead to
crash on other accelerators.

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-05-07 00:05:54 +00:00
Nadav Elyahu 90793aab54
re-introduce: stage3: efficient compute of scaled_global_grad_norm (#5493)
reverting previous revert of this feature:

bc48371c5e
in addition,
bug fix for offload mode.
2024-05-03 20:22:29 +00:00
Logan Adams f32ad3e1c5
Un-pin torch version in nv-torch-latest back to latest and skip test_compile_zero tests on v100 (#5459)
Torch updating to 2.3.0 broke some test_compile_zero tests, we pinned
it, @tohtana pushed fixes in #5463, this should un-pin and move us back
to the latest.

Failing test that indicates the generated code cannot run bf16 on V100
[here](https://github.com/microsoft/DeepSpeed/actions/runs/8838672379/job/24270349996?pr=5459#step:8:5157).
2024-04-29 23:39:12 +00:00
Antônio Vieira 059bb2085c
fix: swapping order of parameters in create_dir_symlink method. (#5465)
Order of parameters in create_dir_symlink method looks wrong. Because
this we get the error "PermissionError: [WinError 5] Denied access:
'.\\deepspeed\\ops\\csrc'" when install deepspeed >= 0.4.0 on Windows
enviroment.

Please check this out @eltonzheng and @jeffra.

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-04-29 17:37:54 +00:00
Logan Adams 4c15ad9f8d
Update with ops not supported on Windows (#5468) 2024-04-25 21:44:39 +00:00
Lev Kurilenko e37296b23c
Update ds-chat CI workflow paths to include zero stage 1-3 files (#5462)
This PR updates the ds-chat CI workflow to run when ZeRO stage 1-3 files
are updated.
2024-04-25 20:36:46 +00:00
Lev Kurilenko bc48371c5e
Revert "stage3: efficient compute of scaled_global_grad_norm (#5256)" (#5461)
This reverts commit 54c0687264 due to
#5256 causing bugs when the ZeRO3 + ZeRO Offload features are enabled.

This bug was discovered due to failures in the DS Chat CI workflow.
Failing tests across CI failures:
| Failing Test Name |
| --- |
| test_ds_chat[zero3--offload-] |
| test_ds_chat[zero3--offload-lora] |
| test_ds_chat[zero3-he-offload-] |
| test_ds_chat[zero3-he-offload-lora] |

Error message:
```
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cpu!
```

It seems that `torch.stack()` or `torch.norm()` is having issues when
the offload feature is enabled and tensors are split between CPU/GPU,
however this is just an initial guess and would require more
investigation.

@nelyahu Since you are the original author of the PR, if you have some
bandwidth, any help here is greatly appreciated!

After reverting this commit, all tests pass in the DS Chat CI workflow:

https://github.com/microsoft/DeepSpeed/actions/runs/8824064414/job/24225802763

@tjruwase for context.
2024-04-25 18:37:15 +00:00
Masahiro Tanaka fcc731f09d
Fix torch.compile error for PyTorch v2.3 (#5463)
PyTorch v2.3 throws an error when it tries to compile `iter_params` used
for ZeRO3.
This PR excludes the function from the compilation targets.

After this PR is merged, we can [unpin the torch version for unit
tests](https://github.com/microsoft/DeepSpeed/pull/5459).
2024-04-25 18:01:35 +00:00
vikram singh shekhawat fa8458b1a8
Add getter and setter methods for compile_backend across accelerators. (#5299)
Add getter and setter methods for `compile_backend` across accelerators,
which provide a mechanism to retrieve the compile backend. These APIs
handle user-defined backend selection and raise a `ValueError` with
informative error messages for unsupported backends.

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-04-24 15:25:18 +00:00
Michael Wyatt fbdf0eaf15
Update version.txt after 0.14.2 release (#5458)
**Auto-generated PR to update version.txt after a DeepSpeed release**
Released version - 0.14.2
Author           - @loadams

Co-authored-by: loadams <loadams@users.noreply.github.com>
2024-04-23 16:27:27 -07:00
Logan Adams 5f631abc2f
Update PyTest torch version to match PyTorch latest official (2.3.0) (#5454) 2024-04-23 16:24:12 -07:00
Jhonso7393 ad2027952f
Update README.md (#5453)
Fixing a minor typo at the README file

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-04-23 13:45:47 -07:00
Jeff Rasley 5e6c9b9311
OptimizedLinear implementation (#5355)
Optimized version of `nn.Linear` that adds features such as:
      * LoRA w. base weight sharding
      * FP [6,8,12] quantization

Depends on #5336 being merged first

Co-authored-by: @rajhans
Co-authored-by: @aurickq

---------

Co-authored-by: Rajhans Samdani <rajhans.samdani@snowflake.com>
Co-authored-by: Jeff Rasley <jeff.rasley@snowflake.com>
2024-04-23 12:24:37 -07:00
inkcherry c66bc4269e
set the default to use set_to_none for clearing gradients in BF16 optimizer. (#5434)
as discussed in #5175, set the default to use set_to_none for clearing
gradients in BF16 optimizer.
Additionally, for the case of zero clearing, use foreach_zero.
Verified correctness with mega-ds llama 7B training.

FYI @loadams

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-04-22 23:27:09 +00:00
Masahiro Tanaka c292b03a40
Improve parallel process of universal checkpoint conversion (#5343)
The conversion script from a regular checkpoint to the universal one
runs the followings in parallel.

1. extracts zero sharded optimizer states
2. merge the shards

However, it passes `map()` a set of only a few tasks (the number
specified as workers). Thus it needs to wait for the slowest tasks to
finish for every set.
This PR submits all the tasks to the pool and wait until the futures get
ready. We can keep all workers running.

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-04-22 19:50:15 +00:00