Decision threshold adjustment in class prediction

被引:65
作者
Chen, J. J. [1 ]
Tsai, C. -A.
Moon, H.
Ahn, H.
Young, J. J.
Chen, C. -H.
机构
[1] Natl Ctr Toxicol Res, Food & Drug Adm, Div Biometry & Risk Assessment, Jefferson, AR 72079 USA
[2] Acad Sinica, Inst Stat Sci, Taipei 11529, Taiwan
[3] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
关键词
concordance; cross validation; receiver operating characteristic curve; sensitivity and specificity; weighted k-NN;
D O I
10.1080/10659360600787700
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Standard classification algorithms are generally designed to maximize the number of correct predictions (concordance). The criterion of maximizing the concordance may not be appropriate in certain applications. In practice, some applications may emphasize high sensitivity (e.g., clinical diagnostic tests) and others may emphasize high specificity (e.g., epidemiology screening studies). This paper considers effects of the decision threshold on sensitivity, specificity, and concordance for four classification methods: logistic regression, classification tree, Fisher's linear discriminant analysis, and a weighted k-nearest neighbor. We investigated the use of decision threshold adjustment to improve performance of either sensitivity or specificity of a classifier under specific conditions. We conducted a Monte Carlo simulation showing that as the decision threshold increases, the sensitivity decreases and the specificity increases; but, the concordance values in an interval around the maximum concordance are similar. For specified sensitivity and specificity levels, an optimal decision threshold might be determined in an interval around the maximum concordance that meets the specified requirement. Three example data sets were analyzed for illustrations.
引用
收藏
页码:337 / 352
页数:16
相关论文
共 24 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[3]  
[Anonymous], AAAI 2000 WORKSH IMB
[4]   The utility of structure-activity relationship (SAR) models for prediction and covariate selection in developmental toxicity: Comparative analysis of logistic regression and decision tree models [J].
Arena, VC ;
Sussman, NB ;
Mazumdar, S ;
Yu, S ;
Macina, OT .
SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2004, 15 (01) :1-18
[5]   The estrogen receptor relative binding affinities of 188 natural and xenochemicals: Structural diversity of ligands [J].
Blair, RM ;
Fang, H ;
Branham, WS ;
Hass, BS ;
Dial, SL ;
Moland, CL ;
Tong, WD ;
Shi, LM ;
Perkins, R ;
Sheehan, DM .
TOXICOLOGICAL SCIENCES, 2000, 54 (01) :138-153
[6]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[7]  
Brieman L, 1995, CART CLASSIFICATION
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]   Classification ensembles for unbalanced class sizes in predictive toxicology [J].
Chen, JJ ;
Tsai, CA ;
Young, JF ;
Kodell, RL .
SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2005, 16 (06) :517-529
[10]  
Friedman J., 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5