DeepSpeed/deepspeed
wyooyw b647fb2470
Fix expert grad scaling problem with ZeRO optimizer (#6546)
Fix [#6545]

work:
- expert gradient average: divide edp_world_size -> divide dp_world_size
- unit test: make sure model with different dp/ep has same expert
gradient

---------

Co-authored-by: wangyiou <wangyiou@xiaohongshu.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-10-23 00:08:39 +00:00
..
autotuning Update BUFSIZE to come from autotuner's constants.py, not numpy (#5686) 2024-06-19 14:19:50 -07:00
checkpoint fix the missing argument in test and typo (#5730) 2024-07-08 21:44:33 +00:00
comm Avoid security issues of subprocess shell (#6498) 2024-09-11 20:07:06 +00:00
compression fix: Remove duplicate word the (#4051) 2023-07-27 09:33:13 -07:00
elasticity Avoid security issues of subprocess shell (#6498) 2024-09-11 20:07:06 +00:00
inference Rearrange inference OPS and stop using builder.load (#5490) 2024-10-09 01:22:28 +00:00
launcher Accept btl_tcp_if_include option through launcher_args (#6613) 2024-10-14 19:26:24 +00:00
linear OptimizedLinear updates (#5791) 2024-08-13 23:36:22 +00:00
model_implementations Rearrange inference OPS and stop using builder.load (#5490) 2024-10-09 01:22:28 +00:00
module_inject Enabled Qwen2-MoE Tensor Parallelism (TP) inference (#6551) 2024-10-09 15:23:16 +00:00
moe reduce cpu host overhead when using moe (#5578) 2024-08-21 21:52:48 +00:00
monitor Pydantic v2 migration (#5167) 2024-08-22 15:38:13 -07:00
nebula fix typo with deepspeed/ (#3547) 2023-06-02 00:47:14 +00:00
nvme DeepNVMe perf tuning (#6560) 2024-09-26 13:07:19 +00:00
ops Rearrange inference OPS and stop using builder.load (#5490) 2024-10-09 01:22:28 +00:00
pipe Update DeepSpeed copyright license to Apache 2.0 (#3111) 2023-03-30 17:14:38 -07:00
profiling Add conditional on torch version for scaled_dot_product_attention (#6517) 2024-09-11 23:21:43 +00:00
runtime Fix expert grad scaling problem with ZeRO optimizer (#6546) 2024-10-23 00:08:39 +00:00
sequence Long sequence parallelism (Ulysses) integration with HuggingFace (#5774) 2024-08-21 01:46:50 +00:00
utils add option to disable logger while compiling to avoid graph breaks (#6496) 2024-10-15 18:30:42 +00:00
__init__.py Long sequence parallelism (Ulysses) integration with HuggingFace (#5774) 2024-08-21 01:46:50 +00:00
accelerator Abstract accelerator (step 2) (#2560) 2023-01-06 23:40:58 -05:00
constants.py Allow env var for timeout (#4405) 2023-10-10 08:56:10 -07:00
env_report.py Add Windows scripts (deepspeed, ds_report). (#5699) 2024-07-09 01:05:09 +00:00
git_version_info.py Make op builder detection adapt to accelerator change (#5206) 2024-03-12 20:48:29 +00:00