Optimal ROC-Based Classification and Performance Analysis under Bayesian Uncertainty Models

被引：8

作者：

Dalton, Lori A. ^{[1
,2
]}

机构：

[1] Ohio State Univ, Dept Elect & Comp Engn, Columbus, OH 43210 USA

[2] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA

来源：

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS | 2016年 / 13卷 / 04期

基金：

美国国家科学基金会;

关键词：

Classification; error estimation; Bayesian estimation; receiver operating characteristic; area under the curve; SQUARE ERROR ESTIMATION; DISCRETE; ESTIMATORS; CURVE;

D O I：

10.1109/TCBB.2015.2465966

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Popular tools to evaluate classifier performance are the false positive rate (FPR), true positive rate (TPR), receiver operator characteristic (ROC) curve, and area under the curve (AUC). Typically, these quantities are estimated from training data using simple resampling and counting methods, which have been shown to perform poorly when the sample size is small, as is typical in many applications. This work takes a model-based approach in classifier training and performance analysis, where we assume the true population densities are members of an uncertainty class of distributions. Given a prior over the uncertainty class and data, we form a posterior and derive optimal mean-squared-error (MSE) FPR and TPR estimators, as well as the sample-conditioned MSE performance of these estimators. The theory also naturally leads to optimal ROC and AUC estimators. Finally, we develop a Neyman-Pearson-based approach to optimal classifier design, which maximizes the estimated TPR for a given estimated FPR. These tools are optimal over the uncertainty class of distributions given the sample, and are available in closed form or can be easily approximated for many models. Applications are demonstrated on both synthetic and real genomic data. MATLAB code and simulations results are available in the online supplementary material.

引用

页码：719 / 729

页数：11

共 33 条

[1]

[Anonymous], 1989, Proceeding of The 6th International Workshop on Machine Learning, DOI 10.1016/B978-1-55860-036-2.50047-3

[2]

[Anonymous], 2011, WILEY SERIES PROBABI

[3] Exact performance of error estimators for discrete classifiers [J].