Граф коммитов

60 Коммитов

Автор SHA1 Сообщение Дата
Edward Hu 1981497193 add youtube link 2023-10-20 20:45:36 -07:00
Edward Hu 4cc5a81c3d
Demo notebook (#63)
* draft

* add demo notebook

* Delete examples/MLP/demo_original.ipynb
2023-10-15 18:27:16 -07:00
Greg Yang a33ea802bc
Merge pull request #38 from TevenLeScao/coord_check_plot_features
coord check plot improvements
2023-03-16 17:03:09 -05:00
TevenLeScao 442f2016c8 doc change 2023-03-16 11:22:14 +01:00
TevenLeScao 942a2991ed removing unnecessary plot 2023-03-10 19:43:02 +01:00
TevenLeScao fad814a51e documentation for module_list, backward compatibility for numerical casting 2023-03-10 19:26:06 +01:00
Greg Yang 97b411dddf
Zero initialization of query heads
closes #36.
The main changes are line 298 and 300. There is a blurb about this at the top of the file doc along with some additional formatting done in other lines.
2023-02-01 10:39:37 -05:00
Greg Yang f306c48b5d
Merge pull request #37 from TevenLeScao/user_shapes
Allowing users to create their own shapes
2023-02-01 09:59:09 -05:00
Greg Yang 5d5571ca1c
fix typo "requirement.txt" 2023-02-01 09:34:14 -05:00
TevenLeScao 3934867cb8 coord check plot improvements 2023-02-01 15:30:14 +01:00
TevenLeScao 564b10c8cb custom user shapes 2023-02-01 15:17:29 +01:00
Edward Hu cf453c44e7
Merge pull request #15 from microsoft/torchdistx
Torchdistx
2023-01-22 15:30:49 -05:00
Edward Hu 133ef61857
Update README.md 2023-01-22 12:30:32 -08:00
Edward Hu 2448e700e3
Update main.py 2023-01-22 12:28:55 -08:00
Edward Hu 1c7771ab25
remove torchdistx disclaimer 2023-01-22 12:26:13 -08:00
Greg Yang 2c907bffb7
Merge pull request #35 from TevenLeScao/plot-bugfix
Plot bugfix
2023-01-17 10:40:18 -06:00
Edward Hu 7a252835e8
Merge branch 'main' into torchdistx 2023-01-10 09:23:55 -05:00
Greg Yang 96d1f404e5
Merge pull request #33 from zanussbaum/dtype_error
fix: dtype for newer torch versions
2023-01-10 08:17:36 -06:00
zanussbaum 87717b98b8 fix: dtype for newer torch versions 2023-01-09 15:05:07 -08:00
Greg Yang 04b72f3b35
Merge pull request #32 from TevenLeScao/main
Proper error return in coord_check.py
2023-01-08 23:15:13 -06:00
TevenLeScao 3896bc40b9 revert previous commit 2023-01-04 17:22:05 +01:00
TevenLeScao d0269b0c3d bugfix where steps stored as float break matplotlib 2023-01-04 17:21:06 +01:00
TevenLeScao b9c3a21338 bugfix where steps stored as float break matplotlib 2023-01-04 17:04:02 +01:00
TevenLeScao d6ee3fa41f Fixing case where None gets passed to coord check hook 2023-01-04 16:22:49 +01:00
TevenLeScao 10e8b53fb3 Proper error return 2023-01-03 16:58:39 +01:00
Greg Yang 183f7c5cb0
fix typos 2022-12-01 09:07:08 -06:00
Edward Hu 44f0702dc5
add import os 2022-06-23 16:23:51 -07:00
Edward Hu 42995559b4 add pointer to torchdistx 2022-06-18 08:08:44 -04:00
Edward Hu 6b931c6123 add disclaimer re torch nightly 2022-06-17 16:29:16 -04:00
Edward Hu 5499062f15 note that torchdistx has to be used with torch nightly 2022-06-04 15:36:58 -04:00
Edward Hu 18f2ff4fe9 Merge branch 'torchdistx' of github.com:microsoft/mup into torchdistx 2022-05-30 17:49:17 -04:00
Edward Hu 265f2d9f63 add --deferred_init option 2022-05-30 17:49:00 -04:00
Greg Yang eac6f1dd71
improve coord check utilities
Improvements to `get_coord_data`
1.     Before, when `lossfn=='mse'`, the target is automatically converted
        to a one hot vector before loss computation. Now, this
        behavior is turned off, and the user needs to explicitly turn on this
        behavior by setting `one_hot_target=True`.
2.     More generally, `one_hot_target` can be turned off for any `lossfn`
3.     Add 'l1' as a loss function specifiable via a string 
4.     Allow callable loss functions

Improvement to `plot_coord_data`:
    Extract subplot width and height to optional args
    `subplot_width`, `subplot_height` so user can control
    plot size.
2022-05-26 23:50:33 -04:00
Edward Hu 244c36086a add torchdistx to readme 2022-05-22 07:50:18 -04:00
Edward Hu 812fb0261f add torchdistx to readme 2022-05-22 07:50:18 -04:00
Edward Hu ba61bd1b4b Update optim.py 2022-05-22 07:48:59 -04:00
Edward Hu d7c94f9e34 add an option to not scale wd for decoupled optimizers 2022-05-22 07:48:44 -04:00
Greg Yang 5dcc1c6847
typo 2022-05-09 00:37:07 -04:00
Edward Hu e968350db8 add torchdistx to readme 2022-05-08 08:34:13 -04:00
Edward Hu 3e3daabdcb add torchdistx to readme 2022-05-08 08:17:03 -04:00
Greg Yang c9d67001c4
link to pytorch issue for tracing param shapes 2022-03-20 16:25:18 -05:00
Edward Hu 89ed7636be add usage of the meta flag to README 2022-03-19 16:21:58 -04:00
Edward Hu 7758dae40b add tests for meta tensors 2022-03-19 14:47:15 -04:00
Greg Yang a2fec5fdb3
adding some tips for coord check 2022-03-17 19:39:41 -05:00
Greg Yang 8b3877a5c8
fix warning about optimizer 2022-03-16 19:07:20 -05:00
Greg Yang 168d704ac8
ongoing discussion of huggingface integration 2022-03-14 18:12:02 -05:00
Greg Yang 7904307ec8
Add comment on backward compatibility 2022-03-14 04:13:24 -05:00
Greg Yang fda87c5cd0
reference pytorch issue for tracing param shapes 2022-03-14 02:33:38 -05:00
Greg Yang 08c268290a
minor edit of README 2022-03-12 22:17:58 -06:00
Greg Yang f21448a129 update blog link 2022-03-08 17:18:35 +00:00