DeepSpeed

История

ranzhejiang 7260890452 reduce cpu host overhead when using moe (#5578 ) The operation `.to('cpu') `is not necessary for exp_counts, and it will cause device to host synchronization which damage performance. Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>		2024-08-21 21:52:48 +00:00
..
__init__.py	Update DeepSpeed copyright license to Apache 2.0 (#3111 )	2023-03-30 17:14:38 -07:00
experts.py	MoE type hints (#5043 )	2024-02-01 14:03:32 -08:00
layer.py	Support MoE for pipeline models (#5338 )	2024-04-08 15:35:53 +00:00
mappings.py	reduce all-to-all communication volume when both expert and non-expert are tensor-parallel (#5626 )	2024-07-22 23:41:14 +00:00
sharded_moe.py	reduce cpu host overhead when using moe (#5578 )	2024-08-21 21:52:48 +00:00
utils.py	Auto convert moe param groups (#5354 )	2024-04-05 17:18:20 +00:00