This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.
natural-language-processing
python3
code-switching
linguistics
synthetic-data-generation
code-mixing
data-generation
language-modeling
Обновлено 2024-07-31 00:01:52 +03:00
This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The text that includes words from two languages such as Hindi written in roman script, mixed with English.
natural-language-processing
python3
code-mixing
code-switching
language-identification
language-tags
linguistics
mallet
Обновлено 2020-08-12 02:05:32 +03:00