firefox-translations-models/evaluation/de-en/wmt15.de-en.cometcompare

62 строки
1.7 KiB
Plaintext

==========================
x_name: wmt15.bergamot.en
y_name: wmt15.microsoft.en
Bootstrap Resampling Results:
x-mean: 0.5187
y-mean: 0.6463
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -21.9009
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt15.microsoft.en outperforms wmt15.bergamot.en.
==========================
x_name: wmt15.bergamot.en
y_name: wmt15.google.en
Bootstrap Resampling Results:
x-mean: 0.5187
y-mean: 0.6359
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -18.7876
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt15.google.en outperforms wmt15.bergamot.en.
==========================
x_name: wmt15.microsoft.en
y_name: wmt15.google.en
Bootstrap Resampling Results:
x-mean: 0.6463
y-mean: 0.6359
ties (%): 0.0067
x_wins (%): 0.9733
y_wins (%): 0.0200
Paired T-Test Results:
statistic: 2.9289
p_value: 0.0034
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt15.microsoft.en outperforms wmt15.google.en.
Summary
If system_x is better than system_y then:
Null hypothesis rejected according to t-test with p_value=0.05.
Scores differ significantly across samples.
system_x \ system_y wmt15.bergamot.en wmt15.microsoft.en wmt15.google.en
--------------------- ------------------- -------------------- -----------------
wmt15.bergamot.en False False
wmt15.microsoft.en True True
wmt15.google.en True False