Comparison Of Two Classifiers When The Data Sets Are Imbalanced: The Power Of The Area Under The Precision-Recall Curve As The Figure Of Merit Versus The Area Under The ROC Curve

被引:8
作者
Sahiner, Berkman [1 ]
Chen, Weijie [1 ]
Pezeshk, Aria [1 ]
Petrick, Nicholas [1 ]
机构
[1] US FDA, CDRH, 10903 New Hampshire Ave, Silver Spring, MD 20993 USA
来源
MEDICAL IMAGING 2017: IMAGE PERCEPTION, OBSERVER PERFORMANCE, AND TECHNOLOGY ASSESSMENT | 2017年 / 10136卷
关键词
Precision-recall curve; receiver operating characteristic curve; area under curve; information retrieval;
D O I
10.1117/12.2254742
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
In many two-class problems in automated classification and information retrieval, the classes are imbalanced, and the separation between the positive and negative classes is large. The precision-recall (PR) curve has been suggested as an alternative to the receiver operating characteristic (ROC) curve to characterize the performance of automated systems when the classes are imbalanced, and the area under the precision-recall curve (AUCPR) has been suggested as an alternative performance measure to the area under the ROC curve (AUCROC). AUCPR and AUCROC are distinct measures of performance, even though the relationship between the precision-recall and ROC curves is well-known. In this study, we compared the statistical power of the AUCPR to that of the AUCROC. Our results indicate that the AUCPR can offer a small statistical advantage when the prevalence is low and the separation between the positive and negative classes is large. When the data set is more balanced or the separation between the classes is low or moderate, AUCROC has slightly higher power.
引用
收藏
页数:9
相关论文
共 7 条
[1]  
[Anonymous], 2015, PLOS ONE
[2]  
[Anonymous], 2006, 23 INT C MACH LEARN, DOI [10.1145/1143844.1143874, DOI 10.1145/1143844.1143874]
[3]   Evaluating Imaging and Computer-aided Detection and Diagnosis Devices at the FDA [J].
Gallas, Brandon D. ;
Chan, Heang-Ping ;
D'Orsi, Carl J. ;
Dodd, Lori E. ;
Giger, Maryellen L. ;
Gur, David ;
Krupinski, Elizabeth A. ;
Metz, Charles E. ;
Myers, Kyle J. ;
Obuchowski, Nancy A. ;
Sahiner, Berkman ;
Toledano, Alicia Y. ;
Zuley, Margarita L. .
ACADEMIC RADIOLOGY, 2012, 19 (04) :463-477
[4]   THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE [J].
HANLEY, JA ;
MCNEIL, BJ .
RADIOLOGY, 1982, 143 (01) :29-36
[5]  
Metz C.E., 1984, INFORM PROCESSING ME, P432
[6]   Order statistics from independent exponential random variables and the sum of the top order statistics [J].
Nagaraja, H. N. .
Advances in Distribution Theory, Order Statistics, and Inference, 2006, :173-185
[7]  
Sahiner B., 2016, SPIE MED IMAGING