зеркало из https://github.com/microsoft/DeepSpeed.git
5fb71c0a18
In sequence_parallel (Ulysses), the sequence parallel size is constrained by the requirement to be divisible by the number of heads, which prevents some models/workloads from setting a specific sequence parallel size. This PR implements uneven all-to-all heads splitting. - both support batch first (b,s,...) and seq_len first(s,b..) layout. - Added unit tests with numerical checks. Locally also tested with **7 heads with sp=4** and **20 heads with sp=8**, and it passed. --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Ma, Guokai <guokai.ma@gmail.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com> |
||
---|---|---|
.. | ||
accelerator | ||
benchmarks | ||
hybrid_engine | ||
lightning | ||
model | ||
onebit | ||
perf | ||
small_model_debugging | ||
torch_compile | ||
unit | ||
.coveragerc | ||
conftest.py | ||
pytest.ini |