Граф коммитов

  • 0efe802194
    Merge a30df44cad into 1981497193 François Rozet 2024-10-18 17:51:07 -0400
  • a30df44cad Revert if François Rozet 2024-10-18 16:10:15 -0400
  • e6ca017808 Add support for dataclasses François Rozet 2024-10-18 13:37:37 -0400
  • 5a9ead1d23 validated Conv1d & 2d JeremyCCHsu 2024-09-04 10:49:25 +0000
  • 99cdded6bf
    Merge 5e90cb3d62 into 1981497193 Franz Srambical 2024-07-17 13:54:04 +0200
  • 5e90cb3d62
    fix: adopt mup/Transformers API for torch2.3 Franz Srambical 2024-07-17 13:52:05 +0200
  • 9bb003b34c
    Merge f16926106b into 1981497193 Thomas Fortin 2024-05-05 22:48:32 -0400
  • f16926106b added warning for using weight decay with MuAdam rather than MuAdamW thomasfortin1 2024-05-05 22:21:18 -0400
  • 23245b80a9 removed mup.Adam, mup.AdamW, and mup.SGD from package thomasfortin1 2024-05-05 22:15:44 -0400
  • 5cb7f5435a
    Merge 22ca9dd696 into 1981497193 janEbert 2024-05-03 14:09:21 +0000
  • 22ca9dd696 Improve logical flow janEbert 2024-05-03 16:08:33 +0200
  • be217f6c15 Explain FSDP caveat janEbert 2024-05-03 15:52:49 +0200
  • c085dbc3f3 Fix FSDP janEbert 2024-05-03 14:52:43 +0200
  • 242bb8f738 1.0.2: add MuReadout counterparts for Conv1d & Conv2d JeremyCCHsu 2024-02-27 05:43:12 +0000
  • e5f520688f make muP compatible with WeightNorm JeremyCCHsu 2024-02-21 10:46:47 +0000
  • f93297c52a
    Merge 6ad93cc35e into 1981497193 marcobellagente93 2023-10-22 01:13:50 +0200
  • 1981497193 add youtube link main Edward Hu 2023-10-20 20:45:36 -0700
  • 6ad93cc35e add width_mult to optimizer dict marco 2023-10-19 16:31:49 +0200
  • 4cc5a81c3d
    Demo notebook (#63) Edward Hu 2023-10-15 18:27:16 -0700
  • 2689fa298f
    Delete examples/MLP/demo_original.ipynb Edward Hu 2023-10-15 18:23:10 -0700
  • 13e990191a add demo notebook Edward Hu 2023-09-18 16:41:25 -0700
  • 7dad75483b draft Edward Hu 2023-09-03 18:08:55 -0400
  • a33ea802bc
    Merge pull request #38 from TevenLeScao/coord_check_plot_features Greg Yang 2023-03-16 17:03:09 -0500
  • 442f2016c8 doc change TevenLeScao 2023-03-16 11:22:14 +0100
  • 942a2991ed removing unnecessary plot TevenLeScao 2023-03-10 19:43:02 +0100
  • fad814a51e documentation for module_list, backward compatibility for numerical casting TevenLeScao 2023-03-10 19:26:06 +0100
  • 97b411dddf
    Zero initialization of query heads Greg Yang 2023-02-01 10:39:37 -0500
  • f306c48b5d
    Merge pull request #37 from TevenLeScao/user_shapes Greg Yang 2023-02-01 09:59:09 -0500
  • 5d5571ca1c
    fix typo "requirement.txt" Greg Yang 2023-02-01 09:34:14 -0500
  • 3934867cb8 coord check plot improvements TevenLeScao 2023-02-01 15:30:14 +0100
  • 564b10c8cb custom user shapes TevenLeScao 2023-02-01 15:17:29 +0100
  • cf453c44e7
    Merge pull request #15 from microsoft/torchdistx Edward Hu 2023-01-22 15:30:49 -0500
  • 133ef61857
    Update README.md torchdistx Edward Hu 2023-01-22 12:30:32 -0800
  • 2448e700e3
    Update main.py Edward Hu 2023-01-22 12:28:55 -0800
  • 1c7771ab25
    remove torchdistx disclaimer Edward Hu 2023-01-22 12:26:13 -0800
  • 2c907bffb7
    Merge pull request #35 from TevenLeScao/plot-bugfix Greg Yang 2023-01-17 10:40:18 -0600
  • 7a252835e8
    Merge branch 'main' into torchdistx Edward Hu 2023-01-10 09:23:55 -0500
  • 96d1f404e5
    Merge pull request #33 from zanussbaum/dtype_error Greg Yang 2023-01-10 08:17:36 -0600
  • 87717b98b8 fix: dtype for newer torch versions zanussbaum 2023-01-09 15:05:07 -0800
  • 04b72f3b35
    Merge pull request #32 from TevenLeScao/main Greg Yang 2023-01-08 23:15:13 -0600
  • 3896bc40b9 revert previous commit TevenLeScao 2023-01-04 17:22:05 +0100
  • d0269b0c3d bugfix where steps stored as float break matplotlib TevenLeScao 2023-01-04 17:21:06 +0100
  • b9c3a21338 bugfix where steps stored as float break matplotlib TevenLeScao 2023-01-04 17:04:02 +0100
  • d6ee3fa41f Fixing case where None gets passed to coord check hook TevenLeScao 2023-01-04 16:22:49 +0100
  • 10e8b53fb3 Proper error return TevenLeScao 2023-01-03 16:58:39 +0100
  • 183f7c5cb0
    fix typos Greg Yang 2022-12-01 09:07:08 -0600
  • 44f0702dc5
    add import os Edward Hu 2022-06-23 16:23:51 -0700
  • 42995559b4 add pointer to torchdistx Edward Hu 2022-06-18 08:08:44 -0400
  • 6b931c6123 add disclaimer re torch nightly Edward Hu 2022-06-17 16:29:16 -0400
  • 16ef490568 add MuOutConv{1,2,3}d & Transpose versions; fixed bugs muconv2d Greg Yang 2022-06-05 22:42:59 +0000
  • 1b4b46eb54 Merge branch 'main' into muconv2d Greg Yang 2022-06-05 20:43:37 +0000
  • 5499062f15 note that torchdistx has to be used with torch nightly Edward Hu 2022-06-04 15:36:58 -0400
  • bf2d53dd5d change DS options deepspeed Edward Hu 2022-06-02 20:28:33 -0400
  • 18f2ff4fe9 Merge branch 'torchdistx' of github.com:microsoft/mup into torchdistx Edward Hu 2022-05-30 17:49:17 -0400
  • 265f2d9f63 add --deferred_init option Edward Hu 2022-05-30 17:49:00 -0400
  • eac6f1dd71
    improve coord check utilities Greg Yang 2022-05-26 23:50:33 -0400
  • 244c36086a add torchdistx to readme Edward Hu 2022-05-08 08:34:13 -0400
  • 812fb0261f add torchdistx to readme Edward Hu 2022-05-08 08:17:03 -0400
  • ba61bd1b4b Update optim.py Edward Hu 2022-05-18 19:09:49 -0400
  • d7c94f9e34 add an option to not scale wd for decoupled optimizers Edward Hu 2022-04-27 13:46:44 -0400
  • 59b0c8694f
    Update optim.py decoupled_wd Edward Hu 2022-05-18 19:09:49 -0400
  • b18c4c8743 switch apex to DS Edward Hu 2022-05-09 18:57:39 -0400
  • a57e2afb13 replace apex with DS Edward Hu 2022-05-09 18:56:14 -0400
  • 5dcc1c6847
    typo Greg Yang 2022-05-09 00:37:07 -0400
  • e968350db8 add torchdistx to readme Edward Hu 2022-05-08 08:34:13 -0400
  • 3e3daabdcb add torchdistx to readme Edward Hu 2022-05-08 08:17:03 -0400
  • 44303b6e63 add an option to not scale wd for decoupled optimizers Edward Hu 2022-04-27 13:46:44 -0400
  • 2ed42d0c8b add MuConv2d Edward Hu 2022-04-06 20:28:55 -0400
  • c9d67001c4
    link to pytorch issue for tracing param shapes Greg Yang 2022-03-20 16:25:18 -0500
  • 89ed7636be add usage of the meta flag to README Edward Hu 2022-03-19 16:21:58 -0400
  • 7758dae40b add tests for meta tensors Edward Hu 2022-03-19 14:47:15 -0400
  • a2fec5fdb3
    adding some tips for coord check Greg Yang 2022-03-17 19:39:41 -0500
  • 8b3877a5c8
    fix warning about optimizer Greg Yang 2022-03-16 19:07:20 -0500
  • 168d704ac8
    ongoing discussion of huggingface integration Greg Yang 2022-03-14 18:12:02 -0500
  • 7904307ec8
    Add comment on backward compatibility Greg Yang 2022-03-14 04:13:24 -0500
  • fda87c5cd0
    reference pytorch issue for tracing param shapes Greg Yang 2022-03-14 02:33:38 -0500
  • 08c268290a
    minor edit of README Greg Yang 2022-03-12 22:17:58 -0600
  • f21448a129 update blog link Greg Yang 2022-03-08 17:18:35 +0000
  • 4e08b4701b add download_url to setup.py Greg Yang 2022-03-08 16:37:13 +0000
  • 17610cd32d add table of contents to README Greg Yang 2022-03-08 16:36:43 +0000
  • 7800c2b09f mutransformers submodule Greg Yang 2022-03-08 14:57:33 +0000
  • 0518b39e91 initial commit v1.0.0 Greg Yang 2022-03-08 06:50:46 +0000
  • 7c86191148 SECURITY.md committed Microsoft Open Source 2021-11-02 13:40:41 -0700
  • ad534a21c8 SUPPORT.md committed Microsoft Open Source 2021-11-02 13:40:41 -0700
  • 4ea8be69aa LICENSE committed Microsoft Open Source 2021-11-02 13:40:40 -0700
  • 40702c1d56 README.md updated to template Microsoft Open Source 2021-11-02 13:40:39 -0700
  • a841a84bdf CODE_OF_CONDUCT.md committed Microsoft Open Source 2021-11-02 13:40:39 -0700
  • 68b924b0b3
    Initial commit msft-edward 2021-11-02 13:36:27 -0700