Optimal ROC-Based Classification and Performance Analysis under Bayesian Uncertainty Models

被引:8
作者
Dalton, Lori A. [1 ,2 ]
机构
[1] Ohio State Univ, Dept Elect & Comp Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
Classification; error estimation; Bayesian estimation; receiver operating characteristic; area under the curve; SQUARE ERROR ESTIMATION; DISCRETE; ESTIMATORS; CURVE;
D O I
10.1109/TCBB.2015.2465966
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Popular tools to evaluate classifier performance are the false positive rate (FPR), true positive rate (TPR), receiver operator characteristic (ROC) curve, and area under the curve (AUC). Typically, these quantities are estimated from training data using simple resampling and counting methods, which have been shown to perform poorly when the sample size is small, as is typical in many applications. This work takes a model-based approach in classifier training and performance analysis, where we assume the true population densities are members of an uncertainty class of distributions. Given a prior over the uncertainty class and data, we form a posterior and derive optimal mean-squared-error (MSE) FPR and TPR estimators, as well as the sample-conditioned MSE performance of these estimators. The theory also naturally leads to optimal ROC and AUC estimators. Finally, we develop a Neyman-Pearson-based approach to optimal classifier design, which maximizes the estimated TPR for a given estimated FPR. These tools are optimal over the uncertainty class of distributions given the sample, and are available in closed form or can be easily approximated for many models. Applications are demonstrated on both synthetic and real genomic data. MATLAB code and simulations results are available in the online supplementary material.
引用
收藏
页码:719 / 729
页数:11
相关论文
共 33 条
[1]  
[Anonymous], 1989, Proceeding of The 6th International Workshop on Machine Learning, DOI 10.1016/B978-1-55860-036-2.50047-3
[2]  
[Anonymous], 2011, WILEY SERIES PROBABI
[3]   Exact performance of error estimators for discrete classifiers [J].
Braga-Neto, U ;
Dougherty, E .
PATTERN RECOGNITION, 2005, 38 (11) :1799-1814
[4]   Bolstered error estimation [J].
Braga-Neto, U ;
Dougherty, E .
PATTERN RECOGNITION, 2004, 37 (06) :1267-1281
[5]   Is cross-validation valid for small-sample microarray classification? [J].
Braga-Neto, UM ;
Dougherty, ER .
BIOINFORMATICS, 2004, 20 (03) :374-380
[6]  
Broemeling L.D., 2007, BAYESIAN BIOSTATISTI
[7]  
Broemeling L.D., 2011, Advanced Bayesian Methods for Medical Test Accuracy
[8]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[9]   Bayesian inferences for receiver operating characteristic curves in the absence of a gold standard [J].
Choi, Young-Ku ;
Johnson, Wesley O. ;
Collins, Michael T. ;
Gardner, Ian A. .
JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2006, 11 (02) :210-229
[10]   Optimal classifiers with minimum expected error within a Bayesian framework-Part I: Discrete and Gaussian models [J].
Dalton, Lori A. ;
Dougherty, Edward R. .
PATTERN RECOGNITION, 2013, 46 (05) :1301-1314