[model_card] bert-base-5lang-cased (#7573)

Co-authored-by: Amin <amin.geotrend@gmail.com>
This commit is contained in:
Amine Abdaoui 2020-10-07 22:38:14 +02:00 коммит произвёл GitHub
Родитель 923dd4e5ef
Коммит 167bce56f2
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
1 изменённых файлов: 64 добавлений и 0 удалений

Просмотреть файл

@ -0,0 +1,64 @@
---
language:
- en
- fr
- es
- de
- zh
tags:
- pytorch
- bert
- multilingual
- en
- fr
- es
- de
- zh
datasets: wikipedia
license: apache-2.0
inference: false
---
# bert-base-5lang-cased
This is a smaller version of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handles only 5 languages (en, fr, es, de and zh) instead of 104.
The model is therefore 30% smaller than the original one (124M parameters instead of 178M) but gives exactly the same representations for the above cited languages.
Starting from `bert-base-5lang-cased` will facilitate the deployment of your model on public cloud platforms while keeping similar results.
For instance, Google Cloud Platform requires that the model size on disk should be lower than 500 MB for serveless deployments (Cloud Functions / Cloud ML) which is not the case of the original `bert-base-multilingual-cased`.
For more information about the models size, memory footprint and loading time please refer to the table below:
| Model | Num parameters | Size | Memory | Loading time |
| ---------------------------- | -------------- | -------- | -------- | ------------ |
| bert-base-multilingual-cased | 178 million | 714 MB | 1400 MB | 4.2 sec |
| bert-base-5lang-cased | 124 million | 495 MB | 950 MB | 3.6 sec |
These measurements have been computed on a [Google Cloud n1-standard-1 machine (1 vCPU, 3.75 GB)](https://cloud.google.com/compute/docs/machine-types\#n1_machine_type).
## How to use
```python
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("amine/bert-base-5lang-cased")
model = AutoModel.from_pretrained("amine/bert-base-5lang-cased")
```
### How to cite
```bibtex
@inproceedings{smallermbert,
title={Load What You Need: Smaller Versions of Mutlilingual BERT},
author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
booktitle={SustaiNLP / EMNLP},
year={2020}
}
```
## Contact
Please contact amine@geotrend.fr for any question, feedback or request.