* Adding Hungarian

* Update evaluation results [skip ci]

* Update model registry [skip ci]

---------

Co-authored-by: CircleCI evaluation job <ci-models-evaluation@firefox-translations>
This commit is contained in:
Andre Natal 2023-05-10 12:17:50 -07:00 коммит произвёл GitHub
Родитель 5481149bbd
Коммит 57718bd80e
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
39 изменённых файлов: 347 добавлений и 10 удалений

Просмотреть файл

@ -92,3 +92,4 @@ Suffix of the model file in the registry:
- Ukrainian <-> English
- Dutch <-> English
- Catalan -> English
- Hungarian -> English

Просмотреть файл

@ -56,15 +56,26 @@ Both absolute and relative differences in BLEU scores between Bergamot and other
## avg
| Translator/Dataset | ru-en | en-nl | en-ru | en-fa | nl-en | uk-en | fa-en | ca-en | en-uk | is-en |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| bergamot | 33.69 | 27.30 | 29.44 | 17.30 | 29.65 | 35.93 | 28.70 | 38.00 | 26.30 | 23.40 |
| google | 38.20 (+4.51, +13.38%) | 29.30 (+2.00, +7.33%) | 34.49 (+5.05, +17.15%) | 27.80 (+10.50, +60.69%) | 33.05 (+3.40, +11.47%) | 42.43 (+6.50, +18.09%) | 40.85 (+12.15, +42.33%) | 48.95 (+10.95, +28.82%) | 32.63 (+6.33, +24.08%) | 38.90 (+15.50, +66.24%) |
| microsoft | 38.38 (+4.68, +13.90%) | 28.80 (+1.50, +5.49%) | 33.62 (+4.18, +14.21%) | 20.50 (+3.20, +18.50%) | 32.60 (+2.95, +9.95%) | 42.30 (+6.37, +17.72%) | 36.15 (+7.45, +25.96%) | 46.50 (+8.50, +22.37%) | 32.03 (+5.73, +21.80%) | 38.17 (+14.77, +63.11%) |
| Translator/Dataset | hu-en | ru-en | en-nl | en-ru | en-fa | nl-en | uk-en | fa-en | ca-en | en-uk | is-en |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| bergamot | 26.53 | 33.69 | 27.30 | 29.44 | 17.30 | 29.65 | 35.93 | 28.70 | 38.00 | 26.30 | 23.40 |
| google | 31.30 (+4.77, +18.00%) | 38.20 (+4.51, +13.38%) | 29.30 (+2.00, +7.33%) | 34.49 (+5.05, +17.15%) | 27.80 (+10.50, +60.69%) | 33.05 (+3.40, +11.47%) | 42.43 (+6.50, +18.09%) | 40.85 (+12.15, +42.33%) | 48.95 (+10.95, +28.82%) | 32.63 (+6.33, +24.08%) | 38.90 (+15.50, +66.24%) |
| microsoft | 31.03 (+4.50, +16.97%) | 38.38 (+4.68, +13.90%) | 28.80 (+1.50, +5.49%) | 33.62 (+4.18, +14.21%) | 20.50 (+3.20, +18.50%) | 32.60 (+2.95, +9.95%) | 42.30 (+6.37, +17.72%) | 36.15 (+7.45, +25.96%) | 46.50 (+8.50, +22.37%) | 32.03 (+5.73, +21.80%) | 38.17 (+14.77, +63.11%) |
![Results](img/avg-bleu.png)
---
## hu-en
| Translator/Dataset | wmt08 | flores-dev | flores-test | wmt09 |
| --- | --- | --- | --- | --- |
| bergamot | 20.00 | 32.20 | 31.60 | 22.30 |
| google | 22.40 (+2.40, +12.00%) | 39.40 (+7.20, +22.36%) | 38.00 (+6.40, +20.25%) | 25.40 (+3.10, +13.90%) |
| microsoft | 22.60 (+2.60, +13.00%) | 38.50 (+6.30, +19.57%) | 38.20 (+6.60, +20.89%) | 24.80 (+2.50, +11.21%) |
![Results](img/hu-en-bleu.png)
---
## ru-en
| Translator/Dataset | mtedx_test | wmt19 | wmt17 | flores-dev | wmt22 | flores-test | wmt14 | wmt15 | wmt16 | wmt13 | wmt18 | wmt21 | wmt20 |

Просмотреть файл

@ -48,15 +48,44 @@ We also compare the systems using the `comet-compare` tool that calculates the s
## avg
| Translator/Dataset | ru-en | en-nl | en-ru | en-fa | nl-en | uk-en | fa-en | ca-en | en-uk | is-en |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| bergamot | 0.49 | 0.58 | 0.54 | 0.31 | 0.63 | 0.52 | 0.50 | 0.65 | 0.51 | 0.15 |
| google | 0.59 (+0.10, +20.83%) | 0.67 (+0.08, +14.30%) | 0.76 (+0.21, +39.38%) | 0.70 (+0.39, +126.54%) | 0.70 (+0.07, +10.71%) | 0.67 (+0.15, +28.26%) | 0.74 (+0.24, +48.00%) | 0.82 (+0.16, +24.78%) | 0.79 (+0.27, +53.31%) | 0.70 (+0.55, +370.91%) |
| microsoft | 0.60 (+0.11, +22.13%) | 0.65 (+0.06, +11.05%) | 0.72 (+0.18, +32.36%) | 0.41 (+0.10, +31.65%) | 0.69 (+0.06, +9.12%) | 0.64 (+0.12, +23.16%) | 0.66 (+0.16, +32.78%) | 0.79 (+0.14, +21.22%) | 0.75 (+0.23, +45.60%) | 0.67 (+0.52, +353.71%) |
| Translator/Dataset | hu-en | ru-en | en-nl | en-ru | en-fa | nl-en | uk-en | fa-en | ca-en | en-uk | is-en |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| bergamot | 0.56 | 0.49 | 0.58 | 0.54 | 0.31 | 0.63 | 0.52 | 0.50 | 0.65 | 0.51 | 0.15 |
| google | 0.66 (+0.10, +17.32%) | 0.59 (+0.10, +20.83%) | 0.67 (+0.08, +14.30%) | 0.76 (+0.21, +39.38%) | 0.70 (+0.39, +126.54%) | 0.70 (+0.07, +10.71%) | 0.67 (+0.15, +28.26%) | 0.74 (+0.24, +48.00%) | 0.82 (+0.16, +24.78%) | 0.79 (+0.27, +53.31%) | 0.70 (+0.55, +370.91%) |
| microsoft | 0.66 (+0.10, +17.85%) | 0.60 (+0.11, +22.13%) | 0.65 (+0.06, +11.05%) | 0.72 (+0.18, +32.36%) | 0.41 (+0.10, +31.65%) | 0.69 (+0.06, +9.12%) | 0.64 (+0.12, +23.16%) | 0.66 (+0.16, +32.78%) | 0.79 (+0.14, +21.22%) | 0.75 (+0.23, +45.60%) | 0.67 (+0.52, +353.71%) |
![Results](img/avg-comet.png)
---
## hu-en
| Translator/Dataset | wmt08 | flores-test | wmt09 | flores-dev |
| --- | --- | --- | --- | --- |
| bergamot | 0.44 | 0.66 | 0.47 | 0.68 |
| google | 0.54 (+0.10, +22.47%) | 0.76 (+0.10, +15.46%) | 0.57 (+0.10, +20.66%) | 0.77 (+0.09, +13.48%) |
| microsoft | 0.55 (+0.11, +23.95%) | 0.76 (+0.10, +15.33%) | 0.57 (+0.10, +21.81%) | 0.77 (+0.09, +13.63%) |
![Results](img/hu-en-comet.png)
### Comparisons between systems
*If a comparison is omitted, the systems have equal averages (tie). Click on the dataset for a complete report*
#### [wmt08.hu-en](hu-en/wmt08.hu-en.cometcompare)
- wmt08.microsoft.en outperforms wmt08.bergamot.en.
- wmt08.google.en outperforms wmt08.bergamot.en.
#### [flores-test.hu-en](hu-en/flores-test.hu-en.cometcompare)
- flores-test.microsoft.en outperforms flores-test.bergamot.en.
- flores-test.google.en outperforms flores-test.bergamot.en.
#### [wmt09.hu-en](hu-en/wmt09.hu-en.cometcompare)
- wmt09.microsoft.en outperforms wmt09.bergamot.en.
- wmt09.google.en outperforms wmt09.bergamot.en.
#### [flores-dev.hu-en](hu-en/flores-dev.hu-en.cometcompare)
- flores-dev.microsoft.en outperforms flores-dev.bergamot.en.
- flores-dev.google.en outperforms flores-dev.bergamot.en.
---
## ru-en
| Translator/Dataset | wmt17 | wmt22 | flores-test | wmt20 | mtedx_test | wmt15 | wmt18 | wmt14 | wmt16 | wmt19 | wmt21 | flores-dev | wmt13 |

Просмотреть файл

@ -0,0 +1 @@
32.2

Просмотреть файл

@ -0,0 +1 @@
0.6780

Просмотреть файл

@ -0,0 +1 @@
39.4

Просмотреть файл

@ -0,0 +1 @@
0.7694

Просмотреть файл

@ -0,0 +1,60 @@
==========================
x_name: flores-dev.bergamot.en
y_name: flores-dev.microsoft.en
Bootstrap Resampling Results:
x-mean: 0.6793
y-mean: 0.7707
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -17.4244
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.microsoft.en outperforms flores-dev.bergamot.en.
==========================
x_name: flores-dev.bergamot.en
y_name: flores-dev.google.en
Bootstrap Resampling Results:
x-mean: 0.6793
y-mean: 0.7692
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -15.9188
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.google.en outperforms flores-dev.bergamot.en.
==========================
x_name: flores-dev.microsoft.en
y_name: flores-dev.google.en
Bootstrap Resampling Results:
x-mean: 0.7707
y-mean: 0.7692
ties (%): 0.1567
x_wins (%): 0.5467
y_wins (%): 0.2967
Paired T-Test Results:
statistic: 0.2625
p_value: 0.7930
Null hypothesis can't be rejected.
Both systems have equal averages.
Summary
If system_x is better than system_y then:
Null hypothesis rejected according to t-test with p_value=0.05.
Scores differ significantly across samples.
system_x \ system_y flores-dev.bergamot.en flores-dev.microsoft.en flores-dev.google.en
----------------------- ------------------------ ------------------------- ----------------------
flores-dev.bergamot.en False False
flores-dev.microsoft.en True False
flores-dev.google.en True False

Просмотреть файл

@ -0,0 +1 @@
38.5

Просмотреть файл

@ -0,0 +1 @@
0.7704

Просмотреть файл

@ -0,0 +1 @@
31.6

Просмотреть файл

@ -0,0 +1 @@
0.6623

Просмотреть файл

@ -0,0 +1 @@
38.0

Просмотреть файл

@ -0,0 +1 @@
0.7647

Просмотреть файл

@ -0,0 +1,60 @@
==========================
x_name: flores-test.bergamot.en
y_name: flores-test.microsoft.en
Bootstrap Resampling Results:
x-mean: 0.6624
y-mean: 0.7634
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -16.7697
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-test.microsoft.en outperforms flores-test.bergamot.en.
==========================
x_name: flores-test.bergamot.en
y_name: flores-test.google.en
Bootstrap Resampling Results:
x-mean: 0.6624
y-mean: 0.7637
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -16.2147
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-test.google.en outperforms flores-test.bergamot.en.
==========================
x_name: flores-test.microsoft.en
y_name: flores-test.google.en
Bootstrap Resampling Results:
x-mean: 0.7634
y-mean: 0.7637
ties (%): 0.1167
x_wins (%): 0.4033
y_wins (%): 0.4800
Paired T-Test Results:
statistic: -0.2467
p_value: 0.8052
Null hypothesis can't be rejected.
Both systems have equal averages.
Summary
If system_x is better than system_y then:
Null hypothesis rejected according to t-test with p_value=0.05.
Scores differ significantly across samples.
system_x \ system_y flores-test.bergamot.en flores-test.microsoft.en flores-test.google.en
------------------------ ------------------------- -------------------------- -----------------------
flores-test.bergamot.en False False
flores-test.microsoft.en True False
flores-test.google.en True False

Просмотреть файл

@ -0,0 +1 @@
38.2

Просмотреть файл

@ -0,0 +1 @@
0.7638

Просмотреть файл

@ -0,0 +1 @@
20.0

Просмотреть файл

@ -0,0 +1 @@
0.4401

Просмотреть файл

@ -0,0 +1 @@
22.4

Просмотреть файл

@ -0,0 +1 @@
0.5390

Просмотреть файл

@ -0,0 +1,60 @@
==========================
x_name: wmt08.bergamot.en
y_name: wmt08.microsoft.en
Bootstrap Resampling Results:
x-mean: 0.4394
y-mean: 0.5451
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -19.3365
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt08.microsoft.en outperforms wmt08.bergamot.en.
==========================
x_name: wmt08.bergamot.en
y_name: wmt08.google.en
Bootstrap Resampling Results:
x-mean: 0.4394
y-mean: 0.5388
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -17.6601
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt08.google.en outperforms wmt08.bergamot.en.
==========================
x_name: wmt08.microsoft.en
y_name: wmt08.google.en
Bootstrap Resampling Results:
x-mean: 0.5451
y-mean: 0.5388
ties (%): 0.0833
x_wins (%): 0.8133
y_wins (%): 0.1033
Paired T-Test Results:
statistic: 1.7401
p_value: 0.0820
Null hypothesis can't be rejected.
Both systems have equal averages.
Summary
If system_x is better than system_y then:
Null hypothesis rejected according to t-test with p_value=0.05.
Scores differ significantly across samples.
system_x \ system_y wmt08.bergamot.en wmt08.microsoft.en wmt08.google.en
--------------------- ------------------- -------------------- -----------------
wmt08.bergamot.en False False
wmt08.microsoft.en True False
wmt08.google.en True False

Просмотреть файл

@ -0,0 +1 @@
22.6

Просмотреть файл

@ -0,0 +1 @@
0.5455

Просмотреть файл

@ -0,0 +1 @@
22.3

Просмотреть файл

@ -0,0 +1 @@
0.4696

Просмотреть файл

@ -0,0 +1 @@
25.4

Просмотреть файл

@ -0,0 +1 @@
0.5666

Просмотреть файл

@ -0,0 +1,60 @@
==========================
x_name: wmt09.bergamot.en
y_name: wmt09.microsoft.en
Bootstrap Resampling Results:
x-mean: 0.4693
y-mean: 0.5718
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -23.9367
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt09.microsoft.en outperforms wmt09.bergamot.en.
==========================
x_name: wmt09.bergamot.en
y_name: wmt09.google.en
Bootstrap Resampling Results:
x-mean: 0.4693
y-mean: 0.5661
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -22.2999
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt09.google.en outperforms wmt09.bergamot.en.
==========================
x_name: wmt09.microsoft.en
y_name: wmt09.google.en
Bootstrap Resampling Results:
x-mean: 0.5718
y-mean: 0.5661
ties (%): 0.0633
x_wins (%): 0.8333
y_wins (%): 0.1033
Paired T-Test Results:
statistic: 1.8459
p_value: 0.0650
Null hypothesis can't be rejected.
Both systems have equal averages.
Summary
If system_x is better than system_y then:
Null hypothesis rejected according to t-test with p_value=0.05.
Scores differ significantly across samples.
system_x \ system_y wmt09.bergamot.en wmt09.microsoft.en wmt09.google.en
--------------------- ------------------- -------------------- -----------------
wmt09.bergamot.en False False
wmt09.microsoft.en True False
wmt09.google.en True False

Просмотреть файл

@ -0,0 +1 @@
24.8

Просмотреть файл

@ -0,0 +1 @@
0.5720

Двоичные данные
evaluation/dev/img/avg-bleu.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 22 KiB

После

Ширина:  |  Высота:  |  Размер: 22 KiB

Двоичные данные
evaluation/dev/img/avg-comet.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 25 KiB

После

Ширина:  |  Высота:  |  Размер: 25 KiB

Двоичные данные
evaluation/dev/img/hu-en-bleu.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 22 KiB

Двоичные данные
evaluation/dev/img/hu-en-comet.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 23 KiB

Просмотреть файл

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0e062d80c132f8cee111b85adbef9841f502f73be09b1b6596f31385dfcd0add
size 2778202

Просмотреть файл

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f66a371ab2ba3316b438ae6430a6616469c5ecea307024a07af1205dfd5d716d
size 13181613

Просмотреть файл

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:97fca39ca82670f3dbb0d5e28a5476cd36a3556bbcced590e6e5de7687013dce
size 419908

Просмотреть файл

@ -602,6 +602,29 @@
"modelType": "dev"
}
},
"huen": {
"model": {
"name": "model.huen.intgemm.alphas.bin",
"size": 17140899,
"estimatedCompressedSize": 13181613,
"expectedSha256Hash": "518356dbb0c071739318601963a87580fb41732652f52bd3635246330c186d9e",
"modelType": "dev"
},
"lex": {
"name": "lex.50.50.huen.s2t.bin",
"size": 5162428,
"estimatedCompressedSize": 2778202,
"expectedSha256Hash": "fff56b2501258ec4c46a8fc715caee7aeb15d853f859cdfacd3ef9903ed2fff1",
"modelType": "dev"
},
"vocab": {
"name": "vocab.huen.spm",
"size": 820746,
"estimatedCompressedSize": 419908,
"expectedSha256Hash": "0db772702235b02d1f29abafb7a49ed77e54c60245b3a46e90716e74263aedd6",
"modelType": "dev"
}
},
"isen": {
"model": {
"name": "model.isen.intgemm.alphas.bin",