Fixed typos
This commit is contained in:
Родитель
5afdca579b
Коммит
5924294bf6
|
@ -183,7 +183,7 @@ You can find that the original input sentence is restored from the vocabulary id
|
|||
### Results (BLEU scores)
|
||||
#### English to Japanese
|
||||
|Setting|vocab size|BLEU(dev)|BLEU(test)|src #tokens/sent.|trg #tokens/sent.|
|
||||
|---|---|---|---|---|---|
|
||||
|:---|---:|---:|---:|---:|---:|
|
||||
|SentencePiece|8k (shared)|0.2785|0.2955|30.9734|25.0540|
|
||||
|SentencePiece|16k (shared)|0.2664|0.2862|27.1827|21.5326|
|
||||
|SentencePiece|32k (shared)|0.2641|0.2849|25.0592|19.0840|
|
||||
|
@ -197,15 +197,15 @@ You can find that the original input sentence is restored from the vocabulary id
|
|||
|Moses/SentencePiece|80k/8k|0.2475|0.2742|21.2513|22.9383|
|
||||
|SentencePiece/KyTea|8k/80k|0.2778|0.2918|27.0429|23.2161|
|
||||
|SentencePiece/MeCab|8k/80k|0.2673|0.2919|27.0429|21.2033|
|
||||
|SentencePiece/neolgod/8k80k|0.2280|0.2494|27.0429|18.4768|
|
||||
|SentencePiece/neolgod|8k80k|0.2280|0.2494|27.0429|18.4768|
|
||||
|
||||
#### Japanese to English
|
||||
|Setting|vocab size|BLEU(dev)|BLEU(test)|src #tokens/sent.|trg #tokens/sent.|
|
||||
|---|---|---|---|---|---|
|
||||
|:---|---:|---:|---:|---:|---:|
|
||||
|SentencePiece|8k (shared)|0.1966|0.2162|25.0540|30.9734|
|
||||
|SentencePiece|16k (shared)|0.1996|0.2160|21.5326|27.1827|
|
||||
|SentencePiece|32k (shared)|0.1949|0.2159|19.0840|25.0592|
|
||||
|SentencePiece|8k (shaerd)|0.1977|**0.2173**|25.4331|31.7693|
|
||||
|SentencePiece(BPE)|8k (shaerd)|0.1977|**0.2173**|25.4331|31.7693|
|
||||
|(KyTea/Moses)+SentencePiece|8k (shared)|0.1921|0.2086|29.9854|31.2719|
|
||||
|(MeCab/Moses)+SentencePiece|8k (shared)|0.1909|0.2049|28.9537|31.4743|
|
||||
|(neologd/Moses)+SentencePiece|8k (shared)|0.1938|0.2137|28.8645|31.2985|
|
||||
|
@ -217,6 +217,7 @@ You can find that the original input sentence is restored from the vocabulary id
|
|||
|MeCab/SentencePiece|80k/8k|0.1892|0.2077|21.2033|27.0429|
|
||||
|neologd/SentencePiece|80k/8k|0.1641|0.1804|18.4768|27.0429|
|
||||
|
||||
#### Discussion
|
||||
* **SentencePiece (Unigram/BPE)** outperforms word-based methods **(Moses/KyTea/MeCab/neologd)** even with a smaller vocabulary (10% of word-based methods).
|
||||
* The number of tokens to represent Japanese sentences is almost comparable between **SentencePiece (unigram)** and **KyTea**, though the vocabulary of **Sentencepice** is much smaller. It implies that Sentencepiece can effectively compress the sentences with a smaller vocabulary set.
|
||||
* Pretokenization can slightly improve the BLEU scores in English to Japanese. In Japanese to English translation, pretokenization doesn't help to improve BLEU.
|
||||
|
|
Загрузка…
Ссылка в новой задаче