Restoring coverage to the Bayesian false discovery rate control procedure

被引:0
作者
David L. Gold
机构
[1] Roswell Park Cancer Institute,
[2] MedImmune LLC,undefined
来源
Knowledge and Information Systems | 2012年 / 33卷
关键词
False discovery rate; Bayesian; Composite hypothesis; Multiple testing;
D O I
暂无
中图分类号
学科分类号
摘要
Principal among knowledge discovery tasks is recognition of insightful patterns or features from data that can inform otherwise challenging decisions. For the costly future decisions, there is little room for error. Features must provide substantial evidence to be robust for classification and dependable for important decisions. Here we seek statistical evidence for feature selection, that feature signals are of sufficient magnitude and frequency to be generalizable for classification. The Bayesian false discovery rate (bFDR) error control procedure is powerfully suited for this task. In realistic situations often encountered in practice, the bFDR procedure is biased, yielding a greater than desired FDR. In other less typical cases, the FDR is less than desired. We investigate the sources of bias in the bFDR procedure, and predict the direction of bias. A new algorithm has been developed to recover the bias in the bFDR control procedure. In simulation and real data mining examples, the new bFDR control algorithm shows promise. The strengths and limitations of the new approach are presented with examples and discussed.
引用
收藏
页码:401 / 417
页数:16
相关论文
共 35 条
[1]  
Wu X(2008)Top 10 algorithms in data mining Knowl Inf Syst 14 137-34
[2]  
Kumar V(2009)Winning the KDD cup orange challenge with ensemble selection JMLR 7 23-2178
[3]  
Quinlan JR(2004)Incipient Alzheimer’s disease: microarray correlation analyses reveal major transcriptional and tumor suppressor responses Proc Natl Acad Sci 101 2173-446
[4]  
Niculescu-Mizil A(2007)False discovery rate paradigms for statistical analyses of microarray gene expression data Bioinformation 1 436-300
[5]  
Perlich C(1995)Controlling the false discovery rate: a practical and powerful approach to multiple testing J R Stat Soc Ser B Methodol 57 289-498
[6]  
Swirszcz G(2002)A direct approach to false discovery rates J R Stat Soc Ser B Methodol 64 479-1242
[7]  
Blalock EM(2003)Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values Bioinformatics 19 1236-1160
[8]  
Geddes JW(2001)Empirical Bayes analysis of a microarray experiment J Am Stat Assoc 96 1151-1061
[9]  
Chen KC(2004)A stochastic process approach to false discovery control Ann Stat 32 1035-2221
[10]  
Porter NM(2009)Error control variability in pathway-based microarray analysis Bioinformatics 25 2216-9