A simple new approach to variable selection in regression, with application to genetic fine mapping

被引:376
作者
Wang, Gao [1 ]
Sarkar, Abhishek [1 ]
Carbonetto, Peter [1 ]
Stephens, Matthew [1 ]
机构
[1] Univ Chicago, Chicago, IL 60637 USA
关键词
Genetic fine mapping; Linear regression; Sparsity; Variable selection; Variational inference; GENOME-WIDE ASSOCIATION; FALSE DISCOVERY RATE; VARIATIONAL INFERENCE; CAUSAL VARIANTS; JOINT ANALYSIS; R-PACKAGE; LOCI; OPTIMIZATION; EXPRESSION; PREDICTION;
D O I
10.1111/rssb.12388
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We introduce a simple new approach to variable selection in linear regression, with a particular focus onquantifying uncertainty in which variables should be selected. The approach is based on a new model-the 'sum of single effects' model, called 'SuSiE'-which comes from writing the sparse vector of regression coefficients as a sum of 'single-effect' vectors, each with one non-zero element. We also introduce a corresponding new fitting procedure-iterative Bayesian stepwise selection (IBSS)-which is a Bayesian analogue of stepwise selection methods. IBSS shares the computational simplicity and speed of traditional stepwise methods but, instead of selecting a single variable at each step, IBSS computes adistributionon variables that captures uncertainty in which variable to select. We provide a formal justification of this intuitive algorithm by showing that it optimizes a variational approximation to the posterior distribution under SuSiE. Further, this approximate posterior distribution naturally yields convenient novel summaries of uncertainty in variable selection, providing a credible set of variables for each selection. Our methods are particularly well suited to settings where variables are highly correlated and detectable effects are sparse, both of which are characteristics of genetic fine mapping applications. We demonstrate through numerical experiments that our methods outperform existing methods for this task, and we illustrate their application to fine mapping genetic variants influencing alternative splicing in human cell lines. We also discuss the potential and challenges for applying these methods to generic variable-selection problems.
引用
收藏
页码:1273 / 1300
页数:28
相关论文
共 80 条
  • [1] Genetic effects on gene expression across human tissues
    Aguet, Francois
    Brown, Andrew A.
    Castel, Stephane E.
    Davis, Joe R.
    He, Yuan
    Jo, Brian
    Mohammadi, Pejman
    Park, Yoson
    Parsana, Princy
    Segre, Ayellet V.
    Strober, Benjamin J.
    Zappala, Zachary
    Cummings, Beryl B.
    Gelfand, Ellen T.
    Hadley, Kane
    Huang, Katherine H.
    Lek, Monkol
    Li, Xiao
    Nedzel, Jared L.
    Nguyen, Duyen Y.
    Noble, Michael S.
    Sullivan, Timothy J.
    Tukiainen, Taru
    MacArthur, Daniel G.
    Getz, Gad
    Management, Nih Program
    Addington, Anjene
    Guan, Ping
    Koester, Susan
    Little, A. Roger
    Lockhart, Nicole C.
    Moore, Helen M.
    Rao, Abhi
    Struewing, Jeffery P.
    Volpi, Simona
    Collection, Biospecimen
    Brigham, Lori E.
    Hasz, Richard
    Hunter, Marcus
    Johns, Christopher
    Johnson, Mark
    Kopen, Gene
    Leinweber, William F.
    Lonsdale, John T.
    McDonald, Alisa
    Mestichelli, Bernadette
    Myer, Kevin
    Roe, Bryan
    Salvatore, Michael
    Shad, Saboor
    [J]. NATURE, 2017, 550 (7675) : 204 - +
  • [2] Efficient Implementations of the Generalized Lasso Dual Path Algorithm
    Arnold, Taylor B.
    Tibshirani, Ryan J.
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2016, 25 (01) : 1 - 27
  • [3] CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS
    Barber, Rina Foygel
    Candes, Emmanuel J.
    [J]. ANNALS OF STATISTICS, 2015, 43 (05) : 2055 - 2085
  • [4] FINEMAP: efficient variable selection using summary data from genome-wide association studies
    Benner, Christian
    Spencer, Chris C. A.
    Havulinna, Aki S.
    Salomaa, Veikko
    Ripatti, Samuli
    Pirinen, Matti
    [J]. BIOINFORMATICS, 2016, 32 (10) : 1493 - 1501
  • [5] BEST SUBSET SELECTION VIA A MODERN OPTIMIZATION LENS
    Bertsimas, Dimitris
    King, Angela
    Mazumder, Rahul
    [J]. ANNALS OF STATISTICS, 2016, 44 (02) : 813 - 852
  • [6] Variational Inference: A Review for Statisticians
    Blei, David M.
    Kucukelbir, Alp
    McAuliffe, Jon D.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) : 859 - 877
  • [7] Evolutionary Stochastic Search for Bayesian Model Exploration
    Bottolo, Leonard
    Richardson, Sylvia
    [J]. BAYESIAN ANALYSIS, 2010, 5 (03): : 583 - 618
  • [8] Bayesian Detection of Expression Quantitative Trait Loci Hot Spots
    Bottolo, Leonardo
    Petretto, Enrico
    Blankenberg, Stefan
    Cambien, Francois
    Cook, Stuart A.
    Tiret, Laurence
    Richardson, Sylvia
    [J]. GENETICS, 2011, 189 (04) : 1449 - +
  • [9] Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies
    Carbonetto, Peter
    Stephens, Matthew
    [J]. BAYESIAN ANALYSIS, 2012, 7 (01): : 73 - 107
  • [10] Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics
    Chen, Wenan
    Larrabee, Beth R.
    Ovsyannikova, Inna G.
    Kennedy, Richard B.
    Haralambieva, Iana H.
    Poland, Gregory A.
    Schaid, Daniel J.
    [J]. GENETICS, 2015, 200 (03) : 719 - +