firefox-translations-models/evaluation/en-fi/wmt16.en-fi.cometcompare

267 строки
6.7 KiB
Plaintext

==========================
x_name: wmt16.bergamot.fi
y_name: wmt16.microsoft.fi
Bootstrap Resampling Results:
x-mean: 0.8815
y-mean: 0.9169
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -28.5455
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt16.microsoft.fi outperforms wmt16.bergamot.fi.
==========================
x_name: wmt16.bergamot.fi
y_name: wmt16.google.fi
Bootstrap Resampling Results:
x-mean: 0.8815
y-mean: 0.9096
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -21.9931
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt16.google.fi outperforms wmt16.bergamot.fi.
==========================
x_name: wmt16.bergamot.fi
y_name: wmt16.argos.fi
Bootstrap Resampling Results:
x-mean: 0.8815
y-mean: 0.8470
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 20.8611
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt16.bergamot.fi outperforms wmt16.argos.fi.
==========================
x_name: wmt16.bergamot.fi
y_name: wmt16.nllb.fi
Bootstrap Resampling Results:
x-mean: 0.8815
y-mean: 0.8492
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 18.0039
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt16.bergamot.fi outperforms wmt16.nllb.fi.
==========================
x_name: wmt16.bergamot.fi
y_name: wmt16.opusmt.fi
Bootstrap Resampling Results:
x-mean: 0.8815
y-mean: 0.9086
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -21.5593
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt16.opusmt.fi outperforms wmt16.bergamot.fi.
==========================
x_name: wmt16.microsoft.fi
y_name: wmt16.google.fi
Bootstrap Resampling Results:
x-mean: 0.9169
y-mean: 0.9096
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 8.8037
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt16.microsoft.fi outperforms wmt16.google.fi.
==========================
x_name: wmt16.microsoft.fi
y_name: wmt16.argos.fi
Bootstrap Resampling Results:
x-mean: 0.9169
y-mean: 0.8470
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 41.5030
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt16.microsoft.fi outperforms wmt16.argos.fi.
==========================
x_name: wmt16.microsoft.fi
y_name: wmt16.nllb.fi
Bootstrap Resampling Results:
x-mean: 0.9169
y-mean: 0.8492
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 39.0803
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt16.microsoft.fi outperforms wmt16.nllb.fi.
==========================
x_name: wmt16.microsoft.fi
y_name: wmt16.opusmt.fi
Bootstrap Resampling Results:
x-mean: 0.9169
y-mean: 0.9086
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 8.6536
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt16.microsoft.fi outperforms wmt16.opusmt.fi.
==========================
x_name: wmt16.google.fi
y_name: wmt16.argos.fi
Bootstrap Resampling Results:
x-mean: 0.9096
y-mean: 0.8470
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 37.5897
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt16.google.fi outperforms wmt16.argos.fi.
==========================
x_name: wmt16.google.fi
y_name: wmt16.nllb.fi
Bootstrap Resampling Results:
x-mean: 0.9096
y-mean: 0.8492
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 35.4453
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt16.google.fi outperforms wmt16.nllb.fi.
==========================
x_name: wmt16.google.fi
y_name: wmt16.opusmt.fi
Bootstrap Resampling Results:
x-mean: 0.9096
y-mean: 0.9086
ties (%): 0.4200
x_wins (%): 0.4967
y_wins (%): 0.0833
Paired T-Test Results:
statistic: 0.8847
p_value: 0.3764
Null hypothesis can't be rejected.
Both systems have equal averages.
==========================
x_name: wmt16.argos.fi
y_name: wmt16.nllb.fi
Bootstrap Resampling Results:
x-mean: 0.8470
y-mean: 0.8492
ties (%): 0.2267
x_wins (%): 0.1133
y_wins (%): 0.6600
Paired T-Test Results:
statistic: -1.2776
p_value: 0.2015
Null hypothesis can't be rejected.
Both systems have equal averages.
==========================
x_name: wmt16.argos.fi
y_name: wmt16.opusmt.fi
Bootstrap Resampling Results:
x-mean: 0.8470
y-mean: 0.9086
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -37.5091
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt16.opusmt.fi outperforms wmt16.argos.fi.
==========================
x_name: wmt16.nllb.fi
y_name: wmt16.opusmt.fi
Bootstrap Resampling Results:
x-mean: 0.8492
y-mean: 0.9086
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -34.7892
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
wmt16.opusmt.fi outperforms wmt16.nllb.fi.
Summary
If system_x is better than system_y then:
Null hypothesis rejected according to t-test with p_value=0.05.
Scores differ significantly across samples.
system_x \ system_y wmt16.bergamot.fi wmt16.microsoft.fi wmt16.google.fi wmt16.argos.fi wmt16.nllb.fi wmt16.opusmt.fi
--------------------- ------------------- -------------------- ----------------- ---------------- --------------- -----------------
wmt16.bergamot.fi False False True True False
wmt16.microsoft.fi True True True True True
wmt16.google.fi True False True True False
wmt16.argos.fi False False False False False
wmt16.nllb.fi False False False False False
wmt16.opusmt.fi True False False True True