зеркало из https://github.com/microsoft/DeepSpeed.git
68c8481bcf
Authors: @awan-10 @conglongli @samyam @jeffra What's new: NCCL-based implementation which provides better performance and usability compared to the MPI-based implementation. Add support to momentum masks for those parameters with constant zero gradients during training. Bug fixes (e.g., #813). * NCCL-based 1-bit Adam + Code Refactor for Comm. Backends (#594) * NCCL based 1-bit Implementation + Refactor to add communication backends (#593) * add nccl 1-bit optim. * temporary commit to save stuff. * Use dist collectives instead of mpi routines. * remove old code for comm. * Fix bugs. still does not work. * modify to test the nccl side code path * Initial gather impl. Works intra-node. * Updates to comm. phase 2. nccl comm. passed the tests. * refactor code to introduce nccl/mpi as backends for onebit adam. * Refactor updates to test/engine. * Fix compile/runtime errors. * simplify support for nccl/mpi backends. * Add missign file * Add compression backend in constructor. Revert later. * modify test with some perf counting. * Implement a true non-blocking gather for nccl side. * Revert "Add compression backend in constructor. Revert later." This reverts commit |
||
---|---|---|
.. | ||
test_mpi_backend.py | ||
test_mpi_perf.py | ||
test_nccl_backend.py | ||
test_nccl_perf.py |