diff --git a/model_cards/neuralspace-reverie/indic-transformers-bn-bert/README.md b/model_cards/neuralspace-reverie/indic-transformers-bn-bert/README.md new file mode 100644 index 000000000..a42e1596a --- /dev/null +++ b/model_cards/neuralspace-reverie/indic-transformers-bn-bert/README.md @@ -0,0 +1,25 @@ +--- +language: +- bn +tags: +- MaskedLM +- Bengali +--- +# Indic-Transformers Bengali BERT +## Model description +This is a BERT language model pre-trained on ~3 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/). +This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training. +## Intended uses & limitations +#### How to use +``` +from transformers import AutoTokenizer, AutoModel +tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-bn-bert') +model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-bn-bert') +text = "আপনি কেমন আছেন?" +input_ids = tokenizer(text, return_tensors='pt')['input_ids'] +out = model(input_ids)[0] +print(out.shape) +# out = [1, 6, 768] +``` +#### Limitations and bias +The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html). diff --git a/model_cards/neuralspace-reverie/indic-transformers-bn-distilbert/README.md b/model_cards/neuralspace-reverie/indic-transformers-bn-distilbert/README.md new file mode 100644 index 000000000..f06c6c9c8 --- /dev/null +++ b/model_cards/neuralspace-reverie/indic-transformers-bn-distilbert/README.md @@ -0,0 +1,29 @@ +--- +language: +- bn +tags: +- MaskedLM +- Bengali +- DistilBERT +- Question-Answering +- Token Classification +- Text Classification +--- +# Indic-Transformers Bengali DistilBERT +## Model description +This is a DistilBERT language model pre-trained on ~6 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/). +This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training. +## Intended uses & limitations +#### How to use +``` +from transformers import AutoTokenizer, AutoModel +tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-bn-distilbert') +model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-bn-distilbert') +text = "আপনি কেমন আছেন?" +input_ids = tokenizer(text, return_tensors='pt')['input_ids'] +out = model(input_ids)[0] +print(out.shape) +# out = [1, 5, 768] +``` +#### Limitations and bias +The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html). diff --git a/model_cards/neuralspace-reverie/indic-transformers-bn-roberta/README.md b/model_cards/neuralspace-reverie/indic-transformers-bn-roberta/README.md new file mode 100644 index 000000000..b2a47660f --- /dev/null +++ b/model_cards/neuralspace-reverie/indic-transformers-bn-roberta/README.md @@ -0,0 +1,29 @@ +--- +language: +- bn +tags: +- MaskedLM +- Bengali +- RoBERTa +- Question-Answering +- Token Classification +- Text Classification +--- +# Indic-Transformers Bengali RoBERTa +## Model description +This is a RoBERTa language model pre-trained on ~6 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/). +This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training. +## Intended uses & limitations +#### How to use +``` +from transformers import AutoTokenizer, AutoModel +tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-bn-roberta') +model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-bn-roberta') +text = "আপনি কেমন আছেন?" +input_ids = tokenizer(text, return_tensors='pt')['input_ids'] +out = model(input_ids)[0] +print(out.shape) +# out = [1, 10, 768] +``` +#### Limitations and bias +The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html). diff --git a/model_cards/neuralspace-reverie/indic-transformers-bn-xlmroberta/README.md b/model_cards/neuralspace-reverie/indic-transformers-bn-xlmroberta/README.md new file mode 100644 index 000000000..ff781b46d --- /dev/null +++ b/model_cards/neuralspace-reverie/indic-transformers-bn-xlmroberta/README.md @@ -0,0 +1,29 @@ +--- +language: +- bn +tags: +- MaskedLM +- Bengali +- XLMRoBERTa +- Question-Answering +- Token Classification +- Text Classification +--- +# Indic-Transformers Bengali XLMRoBERTa +## Model description +This is a XLMRoBERTa language model pre-trained on ~3 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/). +This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training. +## Intended uses & limitations +#### How to use +``` +from transformers import AutoTokenizer, AutoModel +tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-bn-xlmroberta') +model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-bn-xlmroberta') +text = "আপনি কেমন আছেন?" +input_ids = tokenizer(text, return_tensors='pt')['input_ids'] +out = model(input_ids)[0] +print(out.shape) +# out = [1, 5, 768] +``` +#### Limitations and bias +The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html). diff --git a/model_cards/neuralspace-reverie/indic-transformers-hi-bert/README.md b/model_cards/neuralspace-reverie/indic-transformers-hi-bert/README.md new file mode 100644 index 000000000..45f5389ff --- /dev/null +++ b/model_cards/neuralspace-reverie/indic-transformers-hi-bert/README.md @@ -0,0 +1,29 @@ +--- +language: +- hi +tags: +- MaskedLM +- Hindi +- BERT +- Question-Answering +- Token Classification +- Text Classification +--- +# Indic-Transformers Hindi BERT +## Model description +This is a BERT language model pre-trained on ~3 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/). +This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training. +## Intended uses & limitations +#### How to use +``` +from transformers import AutoTokenizer, AutoModel +tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-hi-bert') +model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-hi-bert') +text = "आपका स्वागत हैं" +input_ids = tokenizer(text, return_tensors='pt')['input_ids'] +out = model(input_ids)[0] +print(out.shape) +# out = [1, 5, 768] +``` +#### Limitations and bias +The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html). diff --git a/model_cards/neuralspace-reverie/indic-transformers-hi-distilbert/README.md b/model_cards/neuralspace-reverie/indic-transformers-hi-distilbert/README.md new file mode 100644 index 000000000..2b2024344 --- /dev/null +++ b/model_cards/neuralspace-reverie/indic-transformers-hi-distilbert/README.md @@ -0,0 +1,29 @@ +--- +language: +- hi +tags: +- MaskedLM +- Hindi +- DistilBERT +- Question-Answering +- Token Classification +- Text Classification +--- +# Indic-Transformers Hindi DistilBERT +## Model description +This is a DistilBERT language model pre-trained on ~10 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/). +This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training. +## Intended uses & limitations +#### How to use +``` +from transformers import AutoTokenizer, AutoModel +tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-hi-distilbert') +model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-hi-distilbert') +text = "आपका स्वागत हैं" +input_ids = tokenizer(text, return_tensors='pt')['input_ids'] +out = model(input_ids)[0] +print(out.shape) +# out = [1, 5, 768] +``` +#### Limitations and bias +The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html). diff --git a/model_cards/neuralspace-reverie/indic-transformers-hi-roberta/README.md b/model_cards/neuralspace-reverie/indic-transformers-hi-roberta/README.md new file mode 100644 index 000000000..3852b1d7f --- /dev/null +++ b/model_cards/neuralspace-reverie/indic-transformers-hi-roberta/README.md @@ -0,0 +1,29 @@ +--- +language: +- hi +tags: +- MaskedLM +- Hindi +- RoBERTa +- Question-Answering +- Token Classification +- Text Classification +--- +# Indic-Transformers Hindi RoBERTa +## Model description +This is a RoBERTa language model pre-trained on ~10 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/). +This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training. +## Intended uses & limitations +#### How to use +``` +from transformers import AutoTokenizer, AutoModel +tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-hi-roberta') +model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-hi-roberta') +text = "आपका स्वागत हैं" +input_ids = tokenizer(text, return_tensors='pt')['input_ids'] +out = model(input_ids)[0] +print(out.shape) +# out = [1, 11, 768] +``` +#### Limitations and bias +The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html). diff --git a/model_cards/neuralspace-reverie/indic-transformers-hi-xlmroberta/README.md b/model_cards/neuralspace-reverie/indic-transformers-hi-xlmroberta/README.md new file mode 100644 index 000000000..f7baf9032 --- /dev/null +++ b/model_cards/neuralspace-reverie/indic-transformers-hi-xlmroberta/README.md @@ -0,0 +1,29 @@ +--- +language: +- hi +tags: +- MaskedLM +- Hindi +- XLMRoBERTa +- Question-Answering +- Token Classification +- Text Classification +--- +# Indic-Transformers Hindi XLMRoBERTa +## Model description +This is a XLMRoBERTa language model pre-trained on ~3 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/). +This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training. +## Intended uses & limitations +#### How to use +``` +from transformers import AutoTokenizer, AutoModel +tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-hi-xlmroberta') +model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-hi-xlmroberta') +text = "आपका स्वागत हैं" +input_ids = tokenizer(text, return_tensors='pt')['input_ids'] +out = model(input_ids)[0] +print(out.shape) +# out = [1, 5, 768] +``` +#### Limitations and bias +The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html). diff --git a/model_cards/neuralspace-reverie/indic-transformers-te-bert/README.md b/model_cards/neuralspace-reverie/indic-transformers-te-bert/README.md new file mode 100644 index 000000000..46b7dc3b3 --- /dev/null +++ b/model_cards/neuralspace-reverie/indic-transformers-te-bert/README.md @@ -0,0 +1,29 @@ +--- +language: +- te +tags: +- MaskedLM +- Telugu +- BERT +- Question-Answering +- Token Classification +- Text Classification +--- +# Indic-Transformers Telugu BERT +## Model description +This is a BERT language model pre-trained on ~1.6 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/). +This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training. +## Intended uses & limitations +#### How to use +``` +from transformers import AutoTokenizer, AutoModel +tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-te-bert') +model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-te-bert') +text = "మీరు ఎలా ఉన్నారు" +input_ids = tokenizer(text, return_tensors='pt')['input_ids'] +out = model(input_ids)[0] +print(out.shape) +# out = [1, 5, 768] +``` +#### Limitations and bias +The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html). diff --git a/model_cards/neuralspace-reverie/indic-transformers-te-distilbert/README.md b/model_cards/neuralspace-reverie/indic-transformers-te-distilbert/README.md new file mode 100644 index 000000000..1ce7a8605 --- /dev/null +++ b/model_cards/neuralspace-reverie/indic-transformers-te-distilbert/README.md @@ -0,0 +1,29 @@ +--- +language: +- te +tags: +- MaskedLM +- Telugu +- DistilBERT +- Question-Answering +- Token Classification +- Text Classification +--- +# Indic-Transformers Telugu DistilBERT +## Model description +This is a DistilBERT language model pre-trained on ~2 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/). +This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training. +## Intended uses & limitations +#### How to use +``` +from transformers import AutoTokenizer, AutoModel +tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-te-distilbert') +model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-te-distilbert') +text = "మీరు ఎలా ఉన్నారు" +input_ids = tokenizer(text, return_tensors='pt')['input_ids'] +out = model(input_ids)[0] +print(out.shape) +# out = [1, 5, 768] +``` +#### Limitations and bias +The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html). diff --git a/model_cards/neuralspace-reverie/indic-transformers-te-roberta/README.md b/model_cards/neuralspace-reverie/indic-transformers-te-roberta/README.md new file mode 100644 index 000000000..f9c76cd68 --- /dev/null +++ b/model_cards/neuralspace-reverie/indic-transformers-te-roberta/README.md @@ -0,0 +1,29 @@ +--- +language: +- te +tags: +- MaskedLM +- Telugu +- RoBERTa +- Question-Answering +- Token Classification +- Text Classification +--- +# Indic-Transformers Telugu RoBERTa +## Model description +This is a RoBERTa language model pre-trained on ~2 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/). +This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training. +## Intended uses & limitations +#### How to use +``` +from transformers import AutoTokenizer, AutoModel +tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-te-roberta') +model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-te-roberta') +text = "మీరు ఎలా ఉన్నారు" +input_ids = tokenizer(text, return_tensors='pt')['input_ids'] +out = model(input_ids)[0] +print(out.shape) +# out = [1, 14, 768] +``` +#### Limitations and bias +The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html). diff --git a/model_cards/neuralspace-reverie/indic-transformers-te-xlmroberta/README.md b/model_cards/neuralspace-reverie/indic-transformers-te-xlmroberta/README.md new file mode 100644 index 000000000..78b1e7834 --- /dev/null +++ b/model_cards/neuralspace-reverie/indic-transformers-te-xlmroberta/README.md @@ -0,0 +1,29 @@ +--- +language: +- te +tags: +- MaskedLM +- Telugu +- XLMRoBERTa +- Question-Answering +- Token Classification +- Text Classification +--- +# Indic-Transformers Telugu XLMRoBERTa +## Model description +This is a XLMRoBERTa language model pre-trained on ~1.6 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/). +This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training. +## Intended uses & limitations +#### How to use +``` +from transformers import AutoTokenizer, AutoModel +tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-te-xlmroberta') +model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-te-xlmroberta') +text = "మీరు ఎలా ఉన్నారు" +input_ids = tokenizer(text, return_tensors='pt')['input_ids'] +out = model(input_ids)[0] +print(out.shape) +# out = [1, 5, 768] +``` +#### Limitations and bias +The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html).