Skip to main content

Table 4 Recap of the six use cases results

From: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

 

Balance

Confusion matrix

Accuracy [0, 1]

F1 score [0, 1]

MCC [–1, +1]

Figure

Informative

 

Pos

Neg

TP

FN

TN

FP

    

Response

Use case A1 Positively imbalanced dataset

91

9

90

1

0

9

0.90

0.95

–0.03

Figure 2

MCC

Use case A2 Positively imbalanced dataset

75

25

5

70

19

6

0.24

0.12

–0.24

Suppl. Additional file 1

Accuracy, F1 score, MCC

Use case B1 Balanced dataset

50

50

47

3

5

45

0.52

0.66

+0.07

Suppl. Additional file 2

Accuracy, MCC

Use case B2 Balanced dataset

50

50

10

40

46

4

0.56

0.31

+0.17

Suppl. Additional file 3

accuracy, F1 score, MCC

Use case C1 Negatively imbalanced dataset

10

90

9

1

1

89

0.10

0.17

–0.19

Suppl. Additional file 4

accuracy, F1 score, MCC

Use case C2 Negatively imbalanced dataset

11

89

2

9

88

1

0.90

0.29

+0.31

Suppl. Additional file 5

F1 score, MCC

  1. For the Use case A1, MCC is the only statistical rate able to truthfully inform the readership about the poor performance of the classifier. For the Use case B1, MCC and accuracy are able to inform about the poor performance of the classifier in the prediction of negative data instances, while for the Use case A2, B2, C1, all the three rates (accuracy, F1, and MCC) are able to show this information. For the Use case C2, the MCC and F1 are able to recognize the weak performance of the algorithm in predicting one of the two original dataset classes. pos: number of positives. neg: number of negatives. TP: true positives. FN: false negatives. TN: true negatives. FP: false positives. Informative response: list of confusion matrix rates able to reflect the poor performance of the classifier in the prediction task. We highlighted in bold the informative response of each use case