DeepSpeed

История

inkcherry 7af3a4beb5 add zero3 ```module_granularity_threshold ``` to zero optimization. (#6649 ) This PR adds Z3 coalesced fetch to zero optimization. Currently, some logic can be reused, but it's difficult to realize that as optimization choice(I only discovered these logic when trying to implement it). The benefit of this approach is reducing host overhead（reduce many hooks) and during the process of recursive fetching parameters (especially in fine-grained models, such as those with a large number of moe experts). This is particularly helpful for host-sensitive devices (such as hpu), where it achieved a 40% performance improvement in our customer workloads. FYI @delock @deepcharm --------- Co-authored-by: Ma, Guokai <guokai.ma@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>		2024-11-12 14:25:33 +00:00
..
compression.md	…
config-json.md	add zero3 ```module_granularity_threshold ``` to zero optimization. (#6649 )	2024-11-12 14:25:33 +00:00
deepspeed4science.md	add DeepSpeed4Science white paper (#4502 )	2023-10-11 15:36:06 -07:00
inference.md	update inference pages to point to FastGen (#5029 )	2024-01-30 16:52:04 -08:00
posts-landing.md	…
posts_list_landing.md	…
training.md	Typo Correction (#3621 )	2023-05-31 11:00:57 -07:00
tutorials-landing.md	…