We used a slightly old version of clang-format before, this caused
issues when folks installed the latest via apt or similar rather than
python to try and fix their formatting issues. Plus installing older
versions is a pain and the formatting style of the newer version seems
better?
Fixed the Windows build.
Fixes applied:
- Remove some more ops that don't build on Windows.
- Remove the use of symlinks that didn't work correctly and replace with
`shutil.copytree()`.
- Small fixes to make the C++ code compile.
Tested with Python 3.9 and CUDA 12.1.
---------
Co-authored-by: Costin Eseanu <costineseanu@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Hi.
Please review the following changes
I added support for BF16 to cpu adam. BF16, FP16 and float are supported
at compilation time. the correct template is called at runtime according
to input params dtype.
---------
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
I found the width using in simd_load is different from simd_store.
This implementation confuses me.
The reason lies in the missing parentheses for the type conversion of x
in the SIMD_LOAD2 macro definition, disrupting the intended semantics of
width variable.
I try to make a quick fix for it.
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
* Fix the half-precision version of CPU-Adam
* remove unexpected return
* fix the increase width (fp32/fp16)
* support fp16 tests for cpu-adam
* fix the fp16 data-loading
* change unit-test for fp16 check & slight change to parameter size
* fix for numpy error
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
* fixing the softmax masking when using triangular masking
* move the TILE declaration outside of the SIMD loop
* remove unrelated changes
* fix Adagrad compile issue