BAYESIAN VARIABLE SELECTION REGRESSION FOR GENOME-WIDE ASSOCIATION STUDIES AND OTHER LARGE-SCALE PROBLEMS

被引:239
作者
Guan, Yongtao [1 ,2 ]
Stephens, Matthew [3 ]
机构
[1] Baylor Coll Med, Dept Pediat, USDA Childrens Nutr Res Ctr, Houston, TX 77030 USA
[2] Baylor Coll Med, Dept Mol & Human Genet, USDA Childrens Nutr Res Ctr, Houston, TX 77030 USA
[3] Univ Chicago, Dept Stat, Chicago, IL 60637 USA
关键词
Bayesian regression; variable selection; shrinkage; genome-wide; association study; multi-SNP analysis; heritability; C-REACTIVE PROTEIN; MODEL; DETERMINANTS; HERITABILITY; HNF1A;
D O I
10.1214/11-AOAS455
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider applying Bayesian Variable Selection Regression, or BVSR, to genome-wide association studies and similar large-scale regression problems. Currently, typical genome-wide association studies measure hundreds of thousands, or millions, of genetic variants (SNPs), in thousands or tens of thousands of individuals, and attempt to identify regions harboring SNPs that affect some phenotype or outcome of interest. This goal can naturally be cast as a variable selection regression problem, with the SNPs as the covariates in the regression. Characteristic features of genome-wide association studies include the following: (i) a focus primarily on identifying relevant variables, rather than on prediction; and (ii) many relevant covariates may have tiny effects, making it effectively impossible to confidently identify the complete "correct" subset of variables. Taken together, these factors put a premium on having interpretable measures of confidence for individual covariates being included in the model, which we argue is a strength of BVSR compared with alternatives such as penalized regression methods. Here we focus primarily on analysis of quantitative phenotypes, and on appropriate prior specification for BVSR in this setting, emphasizing the idea of considering what the priors imply about the total proportion of variance in outcome explained by relevant covariates. We also emphasize the potential for BVSR to estimate this proportion of variance explained, and hence shed light on the issue of "missing heritability" in genome-wide association studies. More generally, we demonstrate that, despite the apparent computational challenges, BVSR can provide useful inferences in these large-scale problems, and in our simulations produces better power and predictive performance compared with standard single-SNP analyses and the penalized regression method LASSO. Methods described here are implemented in a software package, pi-MASS, available from the Guan Lab website http://bcm.edu/cnrc/mcmcmc/pimass.
引用
收藏
页码:1780 / 1815
页数:36
相关论文
共 46 条
[1]  
AGLIARI A, 1988, J ROY STAT SOC D-STA, V37, P271
[2]   BAYESIAN-ANALYSIS OF BINARY AND POLYCHOTOMOUS RESPONSE DATA [J].
ALBERT, JH ;
CHIB, S .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (422) :669-679
[3]   Genome-Wide Association of Lipid-Lowering Response to Statins in Combined Study Populations [J].
Barber, Mathew J. ;
Mangravite, Lara M. ;
Hyde, Craig L. ;
Chasman, Daniel I. ;
Smith, Joshua D. ;
McCarty, Catherine A. ;
Li, Xiaohui ;
Wilke, Russell A. ;
Rieder, Mark J. ;
Williams, Paul T. ;
Ridker, Paul M. ;
Chatterjee, Aurobindo ;
Rotter, Jerome I. ;
Nickerson, Deborah A. ;
Stephens, Matthew ;
Krauss, Ronald M. .
PLOS ONE, 2010, 5 (03)
[4]   Optimal predictive model selection [J].
Barbieri, MM ;
Berger, JO .
ANNALS OF STATISTICS, 2004, 32 (03) :870-897
[5]   Bayes model averaging with selection of regressors [J].
Brown, PJ ;
Vannucci, M ;
Fearn, T .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2002, 64 :519-536
[6]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[7]   Rao-Blackwellisation of sampling schemes [J].
Casella, G ;
Robert, CP .
BIOMETRIKA, 1996, 83 (01) :81-94
[8]   Population structure, differential bias and genomic control in a large-scale, case-control association study [J].
Clayton, DG ;
Walker, NM ;
Smyth, DJ ;
Pask, R ;
Cooper, JD ;
Maier, LM ;
Smink, LJ ;
Lam, AC ;
Ovington, NR ;
Stevens, HE ;
Nutland, S ;
Howson, JMM ;
Faham, M ;
Moorhead, M ;
Jones, HB ;
Falkowski, M ;
Hardenbol, P ;
Willis, TD ;
Todd, JA .
NATURE GENETICS, 2005, 37 (11) :1243-1246
[9]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[10]   Sure independence screening for ultrahigh dimensional feature space [J].
Fan, Jianqing ;
Lv, Jinchi .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :849-883