DeepSpeed/tests
inkcherry 5fb71c0a18
sequence parallel for uneven heads (#6392)
In sequence_parallel (Ulysses), the sequence parallel size is
constrained by the requirement to be divisible by the number of heads,
which prevents some models/workloads from setting a specific sequence
parallel size. This PR implements uneven all-to-all heads splitting.

- both support  batch first (b,s,...) and seq_len first(s,b..) layout.
- Added unit tests with numerical checks. Locally also tested with **7
heads with sp=4** and **20 heads with sp=8**, and it passed.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Ma, Guokai <guokai.ma@gmail.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
2024-10-25 18:26:47 +00:00
..
accelerator Update DeepSpeed copyright license to Apache 2.0 (#3111) 2023-03-30 17:14:38 -07:00
benchmarks Fix the openfold training. (#4657) 2023-11-09 02:04:12 +00:00
hybrid_engine DeepSpeed Chat (#3186) 2023-04-11 11:53:38 -07:00
lightning Update DeepSpeed copyright license to Apache 2.0 (#3111) 2023-03-30 17:14:38 -07:00
model Avoid security issues of subprocess shell (#6498) 2024-09-11 20:07:06 +00:00
onebit Add Compressedbackend for Onebit optimizers (#5473) 2024-06-05 20:28:46 +00:00
perf CPUAdam fp16 and bf16 support (#5409) 2024-05-20 12:50:20 +00:00
small_model_debugging Guanhua/partial offload rebase v2 (#590) (#4636) 2023-11-06 14:15:16 -08:00
torch_compile [compile] Show breakdown of graph break (#6601) 2024-10-14 17:31:34 +00:00
unit sequence parallel for uneven heads (#6392) 2024-10-25 18:26:47 +00:00
.coveragerc Reduce Unit Test Times (Part 3) (#3850) 2023-07-12 00:35:49 +00:00
conftest.py Reduce Unit Test Times (Part 3) (#3850) 2023-07-12 00:35:49 +00:00
pytest.ini Inference V2 Human Eval (#4804) 2024-02-22 22:55:40 +00:00