ROC curves and nonrandom data

被引:23
作者
Cook, Jonathan Aaron [1 ]
机构
[1] Publ Co Accounting Oversight Board, 1666 K St NW, Washington, DC USA
关键词
ROC curves; Classifier evaluation; Sample-selection bias; PREDICT CLASSIFICATION PERFORMANCE; SAMPLE SELECTION; MODELS;
D O I
10.1016/j.patrec.2016.11.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper shows that when a classifier is evaluated with nonrandom test data, ROC curves differ from the ROC curves that would be obtained with a random sample. To address this bias, this paper introduces a procedure for plotting ROC curves that are inferred from nonrandom test data. I provide simulations to illustrate the procedure as well as the magnitude of bias that is found in empirical ROC curves constructed with nonrandom test data. The paper also includes a demonstration of the procedure on (non-simulated) data used to model wine preferences in the wine industry. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:35 / 41
页数:7
相关论文
共 23 条
[1]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   Modeling wine preferences by data mining from physicochemical properties [J].
Cortez, Paulo ;
Cerdeira, Antonio ;
Almeida, Fernando ;
Matos, Telmo ;
Reis, Jose .
DECISION SUPPORT SYSTEMS, 2009, 47 (04) :547-553
[4]   Does reject inference really improve the performance of application scoring models? [J].
Crook, J ;
Banasik, J .
JOURNAL OF BANKING & FINANCE, 2004, 28 (04) :857-874
[5]  
Davis J., 2006, ICML 06, DOI 10.1145/1143844.1143874
[6]   MAXIMUM-LIKELIHOOD ESTIMATION OF PARAMETERS OF SIGNAL-DETECTION THEORY AND DETERMINATION OF CONFIDENCE INTERVALS - RATING-METHOD DATA [J].
DORFMAN, DD ;
ALF, E .
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 1969, 6 (03) :487-&
[7]   Bayesian semi-parametric ROC analysis [J].
Erkanli, Alaattin ;
Sung, Minje ;
Costello, E. Jane ;
Angold, Adrian .
STATISTICS IN MEDICINE, 2006, 25 (22) :3905-3928
[8]   A response to Webb and Ting's On the application of ROC analysis to predict classification performance under varying class distributions [J].
Fawcett, T ;
Flach, PA .
MACHINE LEARNING, 2005, 58 (01) :33-38
[9]   An introduction to ROC analysis [J].
Fawcett, Tom .
PATTERN RECOGNITION LETTERS, 2006, 27 (08) :861-874
[10]  
He Haibo., 2011, SELF ADAPTIVE SYSTEM