DeepSpeed

История

inkcherry 5fb71c0a18 sequence parallel for uneven heads (#6392 ) In sequence_parallel (Ulysses), the sequence parallel size is constrained by the requirement to be divisible by the number of heads, which prevents some models/workloads from setting a specific sequence parallel size. This PR implements uneven all-to-all heads splitting. - both support batch first (b,s,...) and seq_len first(s,b..) layout. - Added unit tests with numerical checks. Locally also tested with 7 heads with sp=4 and 20 heads with sp=8, and it passed. --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Ma, Guokai <guokai.ma@gmail.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>		2024-10-25 18:26:47 +00:00
..
accelerator	Update DeepSpeed copyright license to Apache 2.0 (#3111 )	2023-03-30 17:14:38 -07:00
benchmarks	Fix the openfold training. (#4657 )	2023-11-09 02:04:12 +00:00
hybrid_engine	DeepSpeed Chat (#3186 )	2023-04-11 11:53:38 -07:00
lightning	Update DeepSpeed copyright license to Apache 2.0 (#3111 )	2023-03-30 17:14:38 -07:00
model	Avoid security issues of subprocess shell (#6498 )	2024-09-11 20:07:06 +00:00
onebit	Add Compressedbackend for Onebit optimizers (#5473 )	2024-06-05 20:28:46 +00:00
perf	CPUAdam fp16 and bf16 support (#5409 )	2024-05-20 12:50:20 +00:00
small_model_debugging	Guanhua/partial offload rebase v2 (#590 ) (#4636 )	2023-11-06 14:15:16 -08:00
torch_compile	[compile] Show breakdown of graph break (#6601 )	2024-10-14 17:31:34 +00:00
unit	sequence parallel for uneven heads (#6392 )	2024-10-25 18:26:47 +00:00
.coveragerc	Reduce Unit Test Times (Part 3) (#3850 )	2023-07-12 00:35:49 +00:00
conftest.py	Reduce Unit Test Times (Part 3) (#3850 )	2023-07-12 00:35:49 +00:00
pytest.ini	Inference V2 Human Eval (#4804 )	2024-02-22 22:55:40 +00:00