A Strategy on Selecting Performance Metrics for Classifier Evaluation

被引：98

作者：

Liu, Yangguang ^{[1
]}

Zhou, Yangming ^{[1
]}

Wen, Shiting ^{[1
]}

Tang, Chaogang ^{[2
]}

机构：

[1] Zhejiang Univ, Ningbo Inst Technol, Ningbo, Zhejiang, Peoples R China

[2] China Univ Min & Technol, Xuzhou, Peoples R China

来源：

INTERNATIONAL JOURNAL OF MOBILE COMPUTING AND MULTIMEDIA COMMUNICATIONS | 2014年 / 6卷 / 04期

关键词：

Classifiers; Classifiers' Performances; Correlation; Machine Learning Community; Performance Metrics;

D O I：

10.4018/IJMCMC.2014100102

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

The evaluation of classifiers' performances plays a critical role in construction and selection of classification model. Although many performance metrics have been proposed in machine learning community, no general guidelines are available among practitioners regarding which metric to be selected for evaluating a classifier's performance. In this paper, we attempt to provide practitioners with a strategy on selecting performance metrics for classifier evaluation. Firstly, the authors investigate seven widely used performance metrics, namely classification accuracy, F-measure, kappa statistic, root mean square error, mean absolute error, the area under the receiver operating curve, and the area under the precision-recall curve. Secondly, the authors resort to using Pearson linear correlation and Spearman rank correlation to analyses the potential relationship among these seven metrics. Experimental results show that these commonly used metrics can be divided into three groups, and all metrics within a given group are highly correlated but less correlated with metrics from different groups.

引用

页码：20 / 35

页数：16

共 27 条

[1] Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient
Ahlgren, P
Jarneving, B
Rousseau, R
[J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2003, 54 (06): : 550 - 560
[2] Baeza-Yates Ricardo, 1999, MODERN INFORM RETRIE, V463
[3] A lot of randomness is hiding in accuracy
Ben-David, Arle
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2007, 20 (07) : 875 - 885
[4] Half-AUC for the evaluation of sensitive or specific classifiers
Bradley, Andrew P.
[J]. PATTERN RECOGNITION LETTERS, 2014, 38 : 93 - 98
[5] The use of the area under the roc curve in the evaluation of machine learning algorithms
Bradley, AP
[J]. PATTERN RECOGNITION, 1997, 30 (07) : 1145 - 1159
[6] Caruana R., 2004, P 10 ACM SIGKDD INT, P69, DOI [DOI 10.1145/1014052.1014063, 10.1145/1014052]
[7] Cortes C, 2004, ADV NEUR IN, V16, P313
[8] Davis J., 2006, P 23 INT C MACHINE L, P233, DOI [10.1145/1143844.1143874, DOI 10.1145/1143844.1143874, 10.1145/1143844]
[9] Demsar J, 2006, J MACH LEARN RES, V7, P1
[10] An introduction to ROC analysis
Fawcett, Tom
[J]. PATTERN RECOGNITION LETTERS, 2006, 27 (08) : 861 - 874

← 1 2 3 →