Граф коммитов

13 Коммитов

Автор SHA1 Сообщение Дата
Logan Adams 297a6840e1
Update clang-format version from 16 to 18. (#5839)
We used a slightly old version of clang-format before, this caused
issues when folks installed the latest via apt or similar rather than
python to try and fix their formatting issues. Plus installing older
versions is a pain and the formatting style of the newer version seems
better?
2024-08-06 09:14:21 -07:00
Costin Eseanu e7dd28a23d
Fixed the Windows build. (#5596)
Fixed the Windows build.

Fixes applied:
- Remove some more ops that don't build on Windows.
- Remove the use of symlinks that didn't work correctly and replace with
`shutil.copytree()`.
- Small fixes to make the C++ code compile.

Tested with Python 3.9 and CUDA 12.1.

---------

Co-authored-by: Costin Eseanu <costineseanu@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-05-31 22:11:10 +00:00
Liran Bachar 69af361167
CPUAdam fp16 and bf16 support (#5409)
Hi.
Please review the following changes
I added support for BF16 to cpu adam. BF16, FP16 and float are supported
at compilation time. the correct template is called at runtime according
to input params dtype.

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2024-05-20 12:50:20 +00:00
yzhblind f517903162
Fix confusing width in simd_load (#4714)
I found the width using in simd_load is different from simd_store.
This implementation confuses me.
The reason lies in the missing parentheses for the type conversion of x
in the SIMD_LOAD2 macro definition, disrupting the intended semantics of
width variable.
I try to make a quick fix for it.

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-01-10 23:38:53 +00:00
Hongjiu "Enneamer" Zhang 8e64c3b550
feat: add Lion optimizer (#4331)
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2023-10-05 22:32:14 +00:00
Michael Wyatt b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-03-30 17:14:38 -07:00
Jeff Rasley da84e60d98
add missing license info to top of all source code (#2889)
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2023-02-27 11:20:41 -08:00
Connor Holmes a25c31b67d
Update AVX512 Detection (#2621)
* Update cpuinfo AVX512 detection

* Missing conversion from `_mm256` to `_mm256i`

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2022-12-17 05:57:28 -08:00
Reza Yazdani a04480e192
Fix the half-precision version of CPU-Adam (#2032)
* Fix the half-precision version of CPU-Adam

* remove unexpected return

* fix the increase width (fp32/fp16)

* support fp16 tests for cpu-adam

* fix the fp16 data-loading

* change unit-test for fp16 check & slight change to parameter size

* fix for numpy error

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2022-06-23 08:56:26 -07:00
Reza Yazdani 259936a76c
Fix cpu-adam AVX performance (#1637) 2021-12-14 17:33:48 -08:00
Jeff Rasley a10e4811fe
force set lf instead of crlf (https://github.com/pre-commit/pre-commit-hooks#mixed-line-ending) (#1598) 2021-11-29 15:41:18 -08:00
Reza Yazdani af443f63f4
CPU-Adam: Fix compile Issue (#1537)
* fixing the softmax masking when using triangular masking

* move the TILE declaration outside of the SIMD loop

* remove unrelated changes

* fix Adagrad compile issue
2021-11-09 11:45:01 -08:00
Wenhao Hu 8abdaee243
Add cpu adagrad (#1358)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2021-10-26 17:55:17 +00:00