Fixed spelling of training (#4416)
This commit is contained in:
Родитель
757baee846
Коммит
fa6113f9a0
|
@ -6,7 +6,7 @@ Overview
|
|||
|
||||
The ALBERT model was proposed in `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_
|
||||
by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. It presents
|
||||
two parameter-reduction techniques to lower memory consumption and increase the trainig speed of BERT:
|
||||
two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT:
|
||||
|
||||
- Splitting the embedding matrix into two smaller matrices
|
||||
- Using repeating layers split among groups
|
||||
|
|
Загрузка…
Ссылка в новой задаче