EXTENDED BIC FOR SMALL-n-LARGE-P SPARSE GLM

被引:143
作者
Chen, Jiahua [1 ]
Chen, Zehua [2 ]
机构
[1] Univ British Columbia, Dept Stat, Vancouver, BC V6T 1Z4, Canada
[2] Natl Univ Singapore, Dept Stat & Appl Probabil, Singapore 117543, Singapore
关键词
Consistency; exponential family; extended Bayes information criterion; feature selection; generalized linear model; small-n-large-P; VARIABLE SELECTION; MODEL SELECTION; MULTIPLE; REGULARIZATION; CRITERION; LOCI;
D O I
10.5705/ss.2010.216
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The small-n-large-P situation has become common in genetics research, medical studies, risk management, and other fields. Feature selection is crucial in these studies yet poses a serious challenge. The traditional criteria such as AIC, BIC, and cross-validation choose too many features. In this paper, we examine the variable selection problem under the generalized linear models. We study the approach where a prior takes specific account of the small-n-large-P situation. The criterion is shown to be variable selection consistent under generalized linear models. We also report simulation results and a data analysis to illustrate the effectiveness of EBIC for feature selection.
引用
收藏
页码:555 / 574
页数:20
相关论文
共 32 条
[1]   Adapting to unknown sparsity by controlling the false discovery rate [J].
Abramovich, Felix ;
Benjamini, Yoav ;
Donoho, David L. ;
Johnstone, Iain M. .
ANNALS OF STATISTICS, 2006, 34 (02) :584-653
[2]  
Akaike H., 1973, 2 INTERNAT SYMPOS IN, P267, DOI [DOI 10.1007/978-1-4612-1694-0_15, 10.1007/978-1-4612-1694-0, 10.1007/978-1-4612-0919-5_38]
[3]  
[Anonymous], 2010, R LANG ENV STAT COMP
[4]  
[Anonymous], 2000, AMS C MATH CHALL 21
[5]  
[Anonymous], 1983, Generalized Linear Models
[6]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[7]   Modifying the Schwarz Bayesian information criterion to locate multiple interacting quantitative trait loci [J].
Bogdan, M ;
Ghosh, JK ;
Doerge, RW .
GENETICS, 2004, 167 (02) :989-999
[8]   A model selection approach for the identification of quantitative trait loci in experimental crosses [J].
Broman, KW ;
Speed, TP .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2002, 64 :641-656
[9]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[10]  
Candes E, 2007, ANN STAT, V35, P2313, DOI 10.1214/009053606000001523