firefox-translations-models/evaluation/en-fi/flores-dev.en-fi.cometcompare

==========================
x_name: flores-dev.bergamot.fi
y_name: flores-dev.microsoft.fi

Bootstrap Resampling Results:
x-mean:	0.8850
y-mean:	0.9260
ties (%):	0.0000
x_wins (%):	0.0000
y_wins (%):	1.0000

Paired T-Test Results:
statistic:	-18.8550
p_value:	0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.microsoft.fi outperforms flores-dev.bergamot.fi.
==========================
x_name: flores-dev.bergamot.fi
y_name: flores-dev.google.fi

Bootstrap Resampling Results:
x-mean:	0.8850
y-mean:	0.9203
ties (%):	0.0000
x_wins (%):	0.0000
y_wins (%):	1.0000

Paired T-Test Results:
statistic:	-16.0255
p_value:	0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.google.fi outperforms flores-dev.bergamot.fi.
==========================
x_name: flores-dev.bergamot.fi
y_name: flores-dev.argos.fi

Bootstrap Resampling Results:
x-mean:	0.8850
y-mean:	0.8629
ties (%):	0.0000
x_wins (%):	1.0000
y_wins (%):	0.0000

Paired T-Test Results:
statistic:	8.1309
p_value:	0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.bergamot.fi outperforms flores-dev.argos.fi.
==========================
x_name: flores-dev.bergamot.fi
y_name: flores-dev.nllb.fi

Bootstrap Resampling Results:
x-mean:	0.8850
y-mean:	0.8507
ties (%):	0.0000
x_wins (%):	1.0000
y_wins (%):	0.0000

Paired T-Test Results:
statistic:	10.7830
p_value:	0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.bergamot.fi outperforms flores-dev.nllb.fi.
==========================
x_name: flores-dev.bergamot.fi
y_name: flores-dev.opusmt.fi

Bootstrap Resampling Results:
x-mean:	0.8850
y-mean:	0.9091
ties (%):	0.0000
x_wins (%):	0.0000
y_wins (%):	1.0000

Paired T-Test Results:
statistic:	-10.7563
p_value:	0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.opusmt.fi outperforms flores-dev.bergamot.fi.
==========================
x_name: flores-dev.microsoft.fi
y_name: flores-dev.google.fi

Bootstrap Resampling Results:
x-mean:	0.9260
y-mean:	0.9203
ties (%):	0.0133
x_wins (%):	0.9867
y_wins (%):	0.0000

Paired T-Test Results:
statistic:	4.5728
p_value:	0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.microsoft.fi outperforms flores-dev.google.fi.
==========================
x_name: flores-dev.microsoft.fi
y_name: flores-dev.argos.fi

Bootstrap Resampling Results:
x-mean:	0.9260
y-mean:	0.8629
ties (%):	0.0000
x_wins (%):	1.0000
y_wins (%):	0.0000

Paired T-Test Results:
statistic:	24.6597
p_value:	0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.microsoft.fi outperforms flores-dev.argos.fi.
==========================
x_name: flores-dev.microsoft.fi
y_name: flores-dev.nllb.fi

Bootstrap Resampling Results:
x-mean:	0.9260
y-mean:	0.8507
ties (%):	0.0000
x_wins (%):	1.0000
y_wins (%):	0.0000

Paired T-Test Results:
statistic:	26.3181
p_value:	0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.microsoft.fi outperforms flores-dev.nllb.fi.
==========================
x_name: flores-dev.microsoft.fi
y_name: flores-dev.opusmt.fi

Bootstrap Resampling Results:
x-mean:	0.9260
y-mean:	0.9091
ties (%):	0.0000
x_wins (%):	1.0000
y_wins (%):	0.0000

Paired T-Test Results:
statistic:	9.6429
p_value:	0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.microsoft.fi outperforms flores-dev.opusmt.fi.
==========================
x_name: flores-dev.google.fi
y_name: flores-dev.argos.fi

Bootstrap Resampling Results:
x-mean:	0.9203
y-mean:	0.8629
ties (%):	0.0000
x_wins (%):	1.0000
y_wins (%):	0.0000

Paired T-Test Results:
statistic:	23.0643
p_value:	0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.google.fi outperforms flores-dev.argos.fi.
==========================
x_name: flores-dev.google.fi
y_name: flores-dev.nllb.fi

Bootstrap Resampling Results:
x-mean:	0.9203
y-mean:	0.8507
ties (%):	0.0000
x_wins (%):	1.0000
y_wins (%):	0.0000

Paired T-Test Results:
statistic:	24.3388
p_value:	0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.google.fi outperforms flores-dev.nllb.fi.
==========================
x_name: flores-dev.google.fi
y_name: flores-dev.opusmt.fi

Bootstrap Resampling Results:
x-mean:	0.9203
y-mean:	0.9091
ties (%):	0.0000
x_wins (%):	1.0000
y_wins (%):	0.0000

Paired T-Test Results:
statistic:	6.6007
p_value:	0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.google.fi outperforms flores-dev.opusmt.fi.
==========================
x_name: flores-dev.argos.fi
y_name: flores-dev.nllb.fi

Bootstrap Resampling Results:
x-mean:	0.8629
y-mean:	0.8507
ties (%):	0.0067
x_wins (%):	0.9933
y_wins (%):	0.0000

Paired T-Test Results:
statistic:	3.6932
p_value:	0.0002
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.argos.fi outperforms flores-dev.nllb.fi.
==========================
x_name: flores-dev.argos.fi
y_name: flores-dev.opusmt.fi

Bootstrap Resampling Results:
x-mean:	0.8629
y-mean:	0.9091
ties (%):	0.0000
x_wins (%):	0.0000
y_wins (%):	1.0000

Paired T-Test Results:
statistic:	-18.0615
p_value:	0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.opusmt.fi outperforms flores-dev.argos.fi.
==========================
x_name: flores-dev.nllb.fi
y_name: flores-dev.opusmt.fi

Bootstrap Resampling Results:
x-mean:	0.8507
y-mean:	0.9091
ties (%):	0.0000
x_wins (%):	0.0000
y_wins (%):	1.0000

Paired T-Test Results:
statistic:	-21.0367
p_value:	0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.opusmt.fi outperforms flores-dev.nllb.fi.

Summary
If system_x is better than system_y then:
Null hypothesis rejected according to t-test with p_value=0.05.
Scores differ significantly across samples.
system_x \ system_y      flores-dev.bergamot.fi    flores-dev.microsoft.fi    flores-dev.google.fi    flores-dev.argos.fi    flores-dev.nllb.fi    flores-dev.opusmt.fi
-----------------------  ------------------------  -------------------------  ----------------------  ---------------------  --------------------  ----------------------
flores-dev.bergamot.fi                             False                      False                   True                   True                  False
flores-dev.microsoft.fi  True                                                 True                    True                   True                  True
flores-dev.google.fi     True                      False                                              True                   True                  True
flores-dev.argos.fi      False                     False                      False                                          True                  False
flores-dev.nllb.fi       False                     False                      False                   False                                        False
flores-dev.opusmt.fi     True                      False                      False                   True                   True