DeepSpeed/docs/_pages
inkcherry 7af3a4beb5
add zero3 ```module_granularity_threshold ``` to zero optimization. (#6649)
This PR adds Z3 coalesced fetch to zero optimization. Currently, some
logic can be reused, but it's difficult to realize that as optimization
choice(I only discovered these logic when trying to implement it).

The benefit of this approach is reducing host overhead(reduce many
hooks) and during the process of recursive fetching parameters
(especially in fine-grained models, such as those with a large number of
moe experts). This is particularly helpful for host-sensitive devices
(such as hpu), where it achieved a 40% performance improvement in our
customer workloads.
FYI @delock @deepcharm

---------

Co-authored-by: Ma, Guokai <guokai.ma@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-11-12 14:25:33 +00:00
..
compression.md
config-json.md add zero3 ```module_granularity_threshold ``` to zero optimization. (#6649) 2024-11-12 14:25:33 +00:00
deepspeed4science.md add DeepSpeed4Science white paper (#4502) 2023-10-11 15:36:06 -07:00
inference.md update inference pages to point to FastGen (#5029) 2024-01-30 16:52:04 -08:00
posts-landing.md
posts_list_landing.md
training.md Typo Correction (#3621) 2023-05-31 11:00:57 -07:00
tutorials-landing.md