The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

被引:2916
作者
Chicco, Davide [1 ,2 ]
Jurman, Giuseppe [3 ]
机构
[1] Krembil Res Inst, Toronto, ON, Canada
[2] Peter Munk Cardiac Ctr, Toronto, ON, Canada
[3] Fdn Bruno Kessler, Trento, Italy
关键词
Matthews correlation coefficient; Binary classification; F-1; score; Confusion matrices; Machine learning; Biostatistics; Accuracy; Dataset imbalance; Genomics; STATISTICAL COMPARISONS; CLASS IMBALANCE; ROC CURVE; PERFORMANCE; AREA; CLASSIFIERS; ALGORITHMS; PRECISION; AGREEMENT; RECALL;
D O I
10.1186/s12864-019-6413-7
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundTo evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F-1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets.ResultsThe Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset.ConclusionsIn this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F-1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F-1 score in evaluating binary classification tasks by all scientific communities.
引用
收藏
页数:13
相关论文
共 110 条
  • [1] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [2] Alon U., 2000, DATA PERTAINING ARTI
  • [3] [Anonymous], 1954, Journal of Educational Psychology
  • [4] [Anonymous], 2015, Nature, DOI [DOI 10.1038/NATURE14539, 10.1038/nature14539]
  • [5] [Anonymous], BMC GENOMICS, DOI DOI 10.1186/1471-2164-13-356
  • [6] [Anonymous], 2007, TEACH TUTOR MAT
  • [7] [Anonymous], PLOS ONE
  • [8] [Anonymous], P EACL 2012 13 C EUR
  • [9] [Anonymous], P ICML 2011 28 INT C
  • [10] [Anonymous], PATTERN RECOGN LETT