зеркало из https://github.com/microsoft/mup.git
Update README.md
This commit is contained in:
Родитель
2448e700e3
Коммит
133ef61857
|
@ -128,6 +128,7 @@ optimizer = MuSGD(model.parameters(), lr=0.1)
|
|||
Note the base and delta models *do not need to be trained* --- we are only extracting parameter shape information from them.
|
||||
Therefore, optionally, we can avoid instantiating these potentially large models by using the `deferred_init` function in `torchdistx`.
|
||||
After installing [`torchdistx`](https://github.com/pytorch/torchdistx), use `torchdistx.deferred_init.deferred_init(MyModel, **args)` instead of `MyModel(**args)`. See [this page](https://pytorch.org/torchdistx/latest/deferred_init.html) for more detail.
|
||||
In the MLP and Transformer examples (not `mutransformers`) we provided, you can activate this feature by passing `--deferred_init`.
|
||||
|
||||
|
||||
## How `mup` Works Under the Hood
|
||||
|
|
Загрузка…
Ссылка в новой задаче