Fixed typos
This commit is contained in:
Родитель
7b620b159f
Коммит
49879ecb2a
|
@ -189,13 +189,13 @@ enja (Moses/neologd)|80k/80k|0.2102|0.2350|21.25|18.47|
|
|||
jaen (Unigram)|8k (shared)|0.1959|0.2170|25.05|30.97|
|
||||
jaen (Unigram)|16k (shared)|0.1984|0.2155|21.53|27.18|
|
||||
jaen (BPE)|8k (shared)|0.1975|0.2176|25.43|31.76|
|
||||
jaen (Moses/KyTea)|80k/80k|0.1697|0.1974|23.21|21.25|
|
||||
jaen (Moses/MeCab)|80k/80k|0.1654|0.1870|21.20|21.25|
|
||||
jaen (Moses/neologd)|80k/80k|0.1583|0.1838|18.47|21.25|
|
||||
jaen (KyTea/Moses)|80k/80k|0.1697|0.1974|23.21|21.25|
|
||||
jaen (MeCab/Moses)|80k/80k|0.1654|0.1870|21.20|21.25|
|
||||
jaen (neologd/Moses)|80k/80k|0.1583|0.1838|18.47|21.25|
|
||||
|
||||
|
||||
* **SentencePiece (Unigram/BPE)** outperforms word-based methods **(Moses/KyTea/MeCab/neologd)** even with a smaller vocabulary (10% of word-based methods).
|
||||
* The number of tokens to represent Japanese sentences are almost comparable between **SentencePiece (unigram)** and **KyTea**, though the vocabulary of **Sentencepice** is much smaller. It implies that Sentencepieca can effectively compress the sentences with a smaller vocabulary set.
|
||||
* The number of tokens to represent Japanese sentences are almost comparable between **SentencePiece (unigram)** and **KyTea**, though the vocabulary of **Sentencepice** is much smaller. It implies that Sentencepiece can effectively compress the sentences with a smaller vocabulary set.
|
||||
* **Neologd** shows poor BLEU score. Tokenizing sentences with a large named entity dictionary might not be effective in neural-based text processing.
|
||||
* **Unigram** shows slightly better text compression ratio than **BPE**, but no significant differences in BLEU score.
|
||||
|
||||
|
|
Загрузка…
Ссылка в новой задаче