зеркало из https://github.com/microsoft/DeepSpeed.git
8e891aa568
* fixing the softmax masking when using triangular masking * fix a bug in the the layernorm backward kernels * revert back some changes & remove debug code * change the constants to a macro Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> |
||
---|---|---|
.. | ||
adagrad | ||
adam | ||
aio | ||
common | ||
includes | ||
lamb | ||
quantization | ||
sparse_attention | ||
transformer | ||
utils |