зеркало из https://github.com/microsoft/DeepSpeed.git
7af3a4beb5
This PR adds Z3 coalesced fetch to zero optimization. Currently, some logic can be reused, but it's difficult to realize that as optimization choice(I only discovered these logic when trying to implement it). The benefit of this approach is reducing host overhead(reduce many hooks) and during the process of recursive fetching parameters (especially in fine-grained models, such as those with a large number of moe experts). This is particularly helpful for host-sensitive devices (such as hpu), where it achieved a 40% performance improvement in our customer workloads. FYI @delock @deepcharm --------- Co-authored-by: Ma, Guokai <guokai.ma@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> |
||
---|---|---|
.. | ||
compression.md | ||
config-json.md | ||
deepspeed4science.md | ||
inference.md | ||
posts-landing.md | ||
posts_list_landing.md | ||
training.md | ||
tutorials-landing.md |