Edward Hu
1981497193
add youtube link
2023-10-20 20:45:36 -07:00
Edward Hu
4cc5a81c3d
Demo notebook ( #63 )
...
* draft
* add demo notebook
* Delete examples/MLP/demo_original.ipynb
2023-10-15 18:27:16 -07:00
Greg Yang
a33ea802bc
Merge pull request #38 from TevenLeScao/coord_check_plot_features
...
coord check plot improvements
2023-03-16 17:03:09 -05:00
TevenLeScao
442f2016c8
doc change
2023-03-16 11:22:14 +01:00
TevenLeScao
942a2991ed
removing unnecessary plot
2023-03-10 19:43:02 +01:00
TevenLeScao
fad814a51e
documentation for module_list, backward compatibility for numerical casting
2023-03-10 19:26:06 +01:00
Greg Yang
97b411dddf
Zero initialization of query heads
...
closes #36 .
The main changes are line 298 and 300. There is a blurb about this at the top of the file doc along with some additional formatting done in other lines.
2023-02-01 10:39:37 -05:00
Greg Yang
f306c48b5d
Merge pull request #37 from TevenLeScao/user_shapes
...
Allowing users to create their own shapes
2023-02-01 09:59:09 -05:00
Greg Yang
5d5571ca1c
fix typo "requirement.txt"
2023-02-01 09:34:14 -05:00
TevenLeScao
3934867cb8
coord check plot improvements
2023-02-01 15:30:14 +01:00
TevenLeScao
564b10c8cb
custom user shapes
2023-02-01 15:17:29 +01:00
Edward Hu
cf453c44e7
Merge pull request #15 from microsoft/torchdistx
...
Torchdistx
2023-01-22 15:30:49 -05:00
Edward Hu
133ef61857
Update README.md
2023-01-22 12:30:32 -08:00
Edward Hu
2448e700e3
Update main.py
2023-01-22 12:28:55 -08:00
Edward Hu
1c7771ab25
remove torchdistx disclaimer
2023-01-22 12:26:13 -08:00
Greg Yang
2c907bffb7
Merge pull request #35 from TevenLeScao/plot-bugfix
...
Plot bugfix
2023-01-17 10:40:18 -06:00
Edward Hu
7a252835e8
Merge branch 'main' into torchdistx
2023-01-10 09:23:55 -05:00
Greg Yang
96d1f404e5
Merge pull request #33 from zanussbaum/dtype_error
...
fix: dtype for newer torch versions
2023-01-10 08:17:36 -06:00
zanussbaum
87717b98b8
fix: dtype for newer torch versions
2023-01-09 15:05:07 -08:00
Greg Yang
04b72f3b35
Merge pull request #32 from TevenLeScao/main
...
Proper error return in coord_check.py
2023-01-08 23:15:13 -06:00
TevenLeScao
3896bc40b9
revert previous commit
2023-01-04 17:22:05 +01:00
TevenLeScao
d0269b0c3d
bugfix where steps stored as float break matplotlib
2023-01-04 17:21:06 +01:00
TevenLeScao
b9c3a21338
bugfix where steps stored as float break matplotlib
2023-01-04 17:04:02 +01:00
TevenLeScao
d6ee3fa41f
Fixing case where None gets passed to coord check hook
2023-01-04 16:22:49 +01:00
TevenLeScao
10e8b53fb3
Proper error return
2023-01-03 16:58:39 +01:00
Greg Yang
183f7c5cb0
fix typos
2022-12-01 09:07:08 -06:00
Edward Hu
44f0702dc5
add import os
2022-06-23 16:23:51 -07:00
Edward Hu
42995559b4
add pointer to torchdistx
2022-06-18 08:08:44 -04:00
Edward Hu
6b931c6123
add disclaimer re torch nightly
2022-06-17 16:29:16 -04:00
Edward Hu
5499062f15
note that torchdistx has to be used with torch nightly
2022-06-04 15:36:58 -04:00
Edward Hu
18f2ff4fe9
Merge branch 'torchdistx' of github.com:microsoft/mup into torchdistx
2022-05-30 17:49:17 -04:00
Edward Hu
265f2d9f63
add --deferred_init option
2022-05-30 17:49:00 -04:00
Greg Yang
eac6f1dd71
improve coord check utilities
...
Improvements to `get_coord_data`
1. Before, when `lossfn=='mse'`, the target is automatically converted
to a one hot vector before loss computation. Now, this
behavior is turned off, and the user needs to explicitly turn on this
behavior by setting `one_hot_target=True`.
2. More generally, `one_hot_target` can be turned off for any `lossfn`
3. Add 'l1' as a loss function specifiable via a string
4. Allow callable loss functions
Improvement to `plot_coord_data`:
Extract subplot width and height to optional args
`subplot_width`, `subplot_height` so user can control
plot size.
2022-05-26 23:50:33 -04:00
Edward Hu
244c36086a
add torchdistx to readme
2022-05-22 07:50:18 -04:00
Edward Hu
812fb0261f
add torchdistx to readme
2022-05-22 07:50:18 -04:00
Edward Hu
ba61bd1b4b
Update optim.py
2022-05-22 07:48:59 -04:00
Edward Hu
d7c94f9e34
add an option to not scale wd for decoupled optimizers
2022-05-22 07:48:44 -04:00
Greg Yang
5dcc1c6847
typo
2022-05-09 00:37:07 -04:00
Edward Hu
e968350db8
add torchdistx to readme
2022-05-08 08:34:13 -04:00
Edward Hu
3e3daabdcb
add torchdistx to readme
2022-05-08 08:17:03 -04:00
Greg Yang
c9d67001c4
link to pytorch issue for tracing param shapes
2022-03-20 16:25:18 -05:00
Edward Hu
89ed7636be
add usage of the meta flag to README
2022-03-19 16:21:58 -04:00
Edward Hu
7758dae40b
add tests for meta tensors
2022-03-19 14:47:15 -04:00
Greg Yang
a2fec5fdb3
adding some tips for coord check
2022-03-17 19:39:41 -05:00
Greg Yang
8b3877a5c8
fix warning about optimizer
2022-03-16 19:07:20 -05:00
Greg Yang
168d704ac8
ongoing discussion of huggingface integration
2022-03-14 18:12:02 -05:00
Greg Yang
7904307ec8
Add comment on backward compatibility
2022-03-14 04:13:24 -05:00
Greg Yang
fda87c5cd0
reference pytorch issue for tracing param shapes
2022-03-14 02:33:38 -05:00
Greg Yang
08c268290a
minor edit of README
2022-03-12 22:17:58 -06:00
Greg Yang
f21448a129
update blog link
2022-03-08 17:18:35 +00:00