The use of the area under the roc curve in the evaluation of machine learning algorithms

被引:4889
作者
Bradley, AP [1 ]
机构
[1] UNIV QUEENSLAND,DEPT ELECT & COMP ENGN,COOPERAT RES CTR SENSOR SIGNAL & INFORMAT PROC,ST LUCIA,QLD 4072,AUSTRALIA
关键词
the ROC curve; the area under the ROC curve (AUC); accuracy measures; cross-validation; Wilcoxon statistic; standard error;
D O I
10.1016/S0031-3203(96)00142-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we investigate the use of the area under the receiver operating characteristic (ROC) curve (AUC) as a performance measure for machine learning algorithms. As a case study we evaluate six machine learning algorithms (C4.5, Multiscale Classifier, Perceptron, Multi-layer Perceptron, k-Nearest Neighbours, and a Quadratic Discriminant Function) on six ''real world'' medical diagnostics data sets. We compare and discuss the use of AUC to the more conventional overall accuracy and find that AUC exhibits a number of desirable properties when compared to overall accuracy: increased sensitivity in Analysis of Variance (ANOVA) tests; a standard error that decreased as both AUC and the number of test samples increased; decision threshold independent; and it is invariant to a priori class probabilities. The paper concludes with the recommendation that AUC be used in preference to overall accuracy for ''single number'' evaluation of machine learning algorithms. (C) 1997 Pattern Recognition Society.
引用
收藏
页码:1145 / 1159
页数:15
相关论文
共 36 条
[1]  
[Anonymous], 1982, Pattern recognition: A statistical approach
[2]  
Bradley A., 1994, Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems (Cat. No.94TH8019), P37, DOI 10.1109/ANZIIS.1994.396954
[3]  
Breiman L., 1984, Classification and Regression Trees, DOI DOI 10.2307/2530946
[4]   INTERNATIONAL APPLICATION OF A NEW PROBABILITY ALGORITHM FOR THE DIAGNOSIS OF CORONARY-ARTERY DISEASE [J].
DETRANO, R ;
JANOSI, A ;
STEINBRUNN, W ;
PFISTERER, M ;
SCHMID, JJ ;
SANDHU, S ;
GUPPY, KH ;
LEE, S ;
FROELICHER, V .
AMERICAN JOURNAL OF CARDIOLOGY, 1989, 64 (05) :304-310
[5]   MAXIMUM-LIKELIHOOD ESTIMATION OF PARAMETERS OF SIGNAL-DETECTION THEORY AND DETERMINATION OF CONFIDENCE INTERVALS - RATING-METHOD DATA [J].
DORFMAN, DD ;
ALF, E .
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 1969, 6 (03) :487-&
[6]   1977 RIETZ LECTURE - BOOTSTRAP METHODS - ANOTHER LOOK AT THE JACKKNIFE [J].
EFRON, B .
ANNALS OF STATISTICS, 1979, 7 (01) :1-26
[7]  
FRIEDMAN JH, 1995, 12 INT C MACH LEARN
[8]  
FRIEDMAN JH, 1993, NATO ASI SER, V136, P1
[9]  
FUKUNAGA K, 1990, INTRO STATISTICAL PA
[10]   MODELS OF INCREMENTAL CONCEPT-FORMATION [J].
GENNARI, JH ;
LANGLEY, P ;
FISHER, D .
ARTIFICIAL INTELLIGENCE, 1989, 40 (1-3) :11-61