DeepSpeed/deepspeed/moe
ranzhejiang 7260890452
reduce cpu host overhead when using moe (#5578)
The operation `.to('cpu') `is not necessary for exp_counts, and it will
cause device to host synchronization which damage performance.

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-08-21 21:52:48 +00:00
..
__init__.py Update DeepSpeed copyright license to Apache 2.0 (#3111) 2023-03-30 17:14:38 -07:00
experts.py MoE type hints (#5043) 2024-02-01 14:03:32 -08:00
layer.py Support MoE for pipeline models (#5338) 2024-04-08 15:35:53 +00:00
mappings.py reduce all-to-all communication volume when both expert and non-expert are tensor-parallel (#5626) 2024-07-22 23:41:14 +00:00
sharded_moe.py reduce cpu host overhead when using moe (#5578) 2024-08-21 21:52:48 +00:00
utils.py Auto convert moe param groups (#5354) 2024-04-05 17:18:20 +00:00