firefox-translations-models/evaluation/en-fi/flores-dev.en-fi.cometcompare

269 строки
7.3 KiB
Plaintext

==========================
x_name: flores-dev.bergamot.fi
y_name: flores-dev.microsoft.fi
Bootstrap Resampling Results:
x-mean: 0.8850
y-mean: 0.9260
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -18.8550
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.microsoft.fi outperforms flores-dev.bergamot.fi.
==========================
x_name: flores-dev.bergamot.fi
y_name: flores-dev.google.fi
Bootstrap Resampling Results:
x-mean: 0.8850
y-mean: 0.9203
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -16.0255
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.google.fi outperforms flores-dev.bergamot.fi.
==========================
x_name: flores-dev.bergamot.fi
y_name: flores-dev.argos.fi
Bootstrap Resampling Results:
x-mean: 0.8850
y-mean: 0.8629
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 8.1309
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.bergamot.fi outperforms flores-dev.argos.fi.
==========================
x_name: flores-dev.bergamot.fi
y_name: flores-dev.nllb.fi
Bootstrap Resampling Results:
x-mean: 0.8850
y-mean: 0.8507
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 10.7830
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.bergamot.fi outperforms flores-dev.nllb.fi.
==========================
x_name: flores-dev.bergamot.fi
y_name: flores-dev.opusmt.fi
Bootstrap Resampling Results:
x-mean: 0.8850
y-mean: 0.9091
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -10.7563
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.opusmt.fi outperforms flores-dev.bergamot.fi.
==========================
x_name: flores-dev.microsoft.fi
y_name: flores-dev.google.fi
Bootstrap Resampling Results:
x-mean: 0.9260
y-mean: 0.9203
ties (%): 0.0133
x_wins (%): 0.9867
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 4.5728
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.microsoft.fi outperforms flores-dev.google.fi.
==========================
x_name: flores-dev.microsoft.fi
y_name: flores-dev.argos.fi
Bootstrap Resampling Results:
x-mean: 0.9260
y-mean: 0.8629
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 24.6597
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.microsoft.fi outperforms flores-dev.argos.fi.
==========================
x_name: flores-dev.microsoft.fi
y_name: flores-dev.nllb.fi
Bootstrap Resampling Results:
x-mean: 0.9260
y-mean: 0.8507
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 26.3181
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.microsoft.fi outperforms flores-dev.nllb.fi.
==========================
x_name: flores-dev.microsoft.fi
y_name: flores-dev.opusmt.fi
Bootstrap Resampling Results:
x-mean: 0.9260
y-mean: 0.9091
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 9.6429
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.microsoft.fi outperforms flores-dev.opusmt.fi.
==========================
x_name: flores-dev.google.fi
y_name: flores-dev.argos.fi
Bootstrap Resampling Results:
x-mean: 0.9203
y-mean: 0.8629
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 23.0643
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.google.fi outperforms flores-dev.argos.fi.
==========================
x_name: flores-dev.google.fi
y_name: flores-dev.nllb.fi
Bootstrap Resampling Results:
x-mean: 0.9203
y-mean: 0.8507
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 24.3388
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.google.fi outperforms flores-dev.nllb.fi.
==========================
x_name: flores-dev.google.fi
y_name: flores-dev.opusmt.fi
Bootstrap Resampling Results:
x-mean: 0.9203
y-mean: 0.9091
ties (%): 0.0000
x_wins (%): 1.0000
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 6.6007
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.google.fi outperforms flores-dev.opusmt.fi.
==========================
x_name: flores-dev.argos.fi
y_name: flores-dev.nllb.fi
Bootstrap Resampling Results:
x-mean: 0.8629
y-mean: 0.8507
ties (%): 0.0067
x_wins (%): 0.9933
y_wins (%): 0.0000
Paired T-Test Results:
statistic: 3.6932
p_value: 0.0002
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.argos.fi outperforms flores-dev.nllb.fi.
==========================
x_name: flores-dev.argos.fi
y_name: flores-dev.opusmt.fi
Bootstrap Resampling Results:
x-mean: 0.8629
y-mean: 0.9091
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -18.0615
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.opusmt.fi outperforms flores-dev.argos.fi.
==========================
x_name: flores-dev.nllb.fi
y_name: flores-dev.opusmt.fi
Bootstrap Resampling Results:
x-mean: 0.8507
y-mean: 0.9091
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000
Paired T-Test Results:
statistic: -21.0367
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.opusmt.fi outperforms flores-dev.nllb.fi.
Summary
If system_x is better than system_y then:
Null hypothesis rejected according to t-test with p_value=0.05.
Scores differ significantly across samples.
system_x \ system_y flores-dev.bergamot.fi flores-dev.microsoft.fi flores-dev.google.fi flores-dev.argos.fi flores-dev.nllb.fi flores-dev.opusmt.fi
----------------------- ------------------------ ------------------------- ---------------------- --------------------- -------------------- ----------------------
flores-dev.bergamot.fi False False True True False
flores-dev.microsoft.fi True True True True True
flores-dev.google.fi True False True True True
flores-dev.argos.fi False False False True False
flores-dev.nllb.fi False False False False False
flores-dev.opusmt.fi True False False True True