Bayesian Subset Modeling for High-Dimensional Generalized Linear Models

被引:51
作者
Liang, Faming [1 ]
Song, Qifan [1 ]
Yu, Kai [2 ]
机构
[1] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
[2] NCI, Div Canc Epidemiol & Genet, Rockville, MD 20892 USA
基金
美国国家科学基金会;
关键词
Bayesian classification; Posterior consistency; Stochastic approximation Monte Carlo; Sure variable screening; Variable selection; VARIABLE-SELECTION; STOCHASTIC-APPROXIMATION; MONTE-CARLO; DISCOVERY; REGRESSION; REGULARIZATION; CONVERGENCE; CONSISTENCY; LIKELIHOOD; SEARCH;
D O I
10.1080/01621459.2012.761942
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This article presents a new prior setting for high-dimensional generalized linear models, which leads to a Bayesian subset regression (BSR) with the maximum a posteriori model approximately equivalent to the minimum extended Bayesian information criterion model. The consistency of the resulting posterior is established under mild conditions. Further, a variable screening procedure is proposed based on the marginal inclusion probability, which shares the same properties of sure screening and consistency with the existing sure independence screening (SIS) and iterative sure independence screening (ISIS) procedures. However, since the proposed procedure makes use of joint information from all predictors, it generally outperforms SIS and ISIS in real applications. This article also makes extensive comparisons of BSR with the popular penalized likelihood methods, including Lasso, elastic net, SIS, and ISIS. The numerical results indicate that BSR can generally outperform the penalized likelihood methods. The models selected by BSR tend to be sparser and, more importantly, of higher prediction ability. In addition, the performance of the penalized likelihood methods tends to deteriorate as the number of predictors increases, while this is not significant for BSR. Supplementary materials for this article are available online.
引用
收藏
页码:589 / 606
页数:18
相关论文
共 46 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]  
[Anonymous], 2010, Bayesian Model Selection and Statistical Modeling
[3]   Gene selection using a two-level hierarchical Bayesian model [J].
Bae, K ;
Mallick, BK .
BIOINFORMATICS, 2004, 20 (18) :3423-3430
[4]   Optimal predictive model selection [J].
Barbieri, MM ;
Berger, JO .
ANNALS OF STATISTICS, 2004, 32 (03) :870-897
[5]   Adaptive linear step-up procedures that control the false discovery rate [J].
Benjamini, Yoav ;
Krieger, Abba M. ;
Yekutieli, Daniel .
BIOMETRIKA, 2006, 93 (03) :491-507
[6]   Evolutionary Stochastic Search for Bayesian Model Exploration [J].
Bottolo, Leonard ;
Richardson, Sylvia .
BAYESIAN ANALYSIS, 2010, 5 (03) :583-618
[7]   Hyper-g Priors for Generalized Linear Models [J].
Bove, Daniel Sabanes ;
Held, Leonhard .
BAYESIAN ANALYSIS, 2011, 6 (03) :387-410
[8]   A model selection approach for the identification of quantitative trait loci in experimental crosses [J].
Broman, KW ;
Speed, TP .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2002, 64 :641-656
[9]   Searching for Genotype-Phenotype Structure: Using Hierarchical Log-Linear Models in Crohn Disease [J].
Chapman, Juliet M. ;
Onnie, Clive M. ;
Prescott, Natalie J. ;
Fisher, Sheila A. ;
Mansfield, John C. ;
Mathew, Christopher G. ;
Lewis, Cathryn M. ;
Verzilli, Claudio J. ;
Whittaker, John C. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 84 (02) :178-187
[10]   Extended Bayesian information criteria for model selection with large model spaces [J].
Chen, Jiahua ;
Chen, Zehua .
BIOMETRIKA, 2008, 95 (03) :759-771