Bayesian hierarchical hypothesis testing in large-scale genome-wide association analysis

被引:0
|
作者
Samaddar, Anirban [1 ]
Maiti, Tapabrata [1 ]
de los Campos, Gustavo [1 ,2 ,3 ]
机构
[1] Michigan State Univ, Dept Stat & Probabil, E Lansing, MI 48824 USA
[2] Michigan State Univ, Dept Epidemiol & Biostat, E Lansing, MI 48824 USA
[3] Michigan State Univ, Inst Quantitat Hlth Sci & Engn, E Lansing, MI 48824 USA
关键词
Bayesian variable selection; Bayesian hierarchical hypothesis testing; false discovery rate; GWAS; collinearity; multiresolution inference; spike and slab prior; linkage disequilibrium; UK-Biobank data; FALSE DISCOVERY RATE; VARIABLE-SELECTION; REGRESSION; HERITABILITY; PREDICTION;
D O I
10.1093/genetics/iyae164
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Variable selection and large-scale hypothesis testing are techniques commonly used to analyze high-dimensional genomic data. Despite recent advances in theory and methodology, variable selection and inference with highly collinear features remain challenging. For instance, collinearity poses a great challenge in genome-wide association studies involving millions of variants, many of which may be in high linkage disequilibrium. In such settings, collinearity can significantly reduce the power of variable selection methods to identify individual variants associated with an outcome. To address such challenges, we developed a Bayesian hierarchical hypothesis testing (BHHT)-a novel multiresolution testing procedure that offers high power with adequate error control and fine-mapping resolution. We demonstrate through simulations that the proposed methodology has a power-FDR performance that is competitive with (and in many scenarios better than) state-of-the-art methods. Finally, we demonstrate the feasibility of using BHHT with large sample size ( n similar to 300,000) and ultra dimensional genotypes (similar to 15 million single-nucleotide polymorphisms or SNPs) by applying it to eight complex traits using data from the UK-Biobank. Our results show that the proposed methodology leads to many more discoveries than those obtained using traditional SNP-centered inference procedures. The article is accompanied by open-source software that implements the methods described in this study using algorithms that scale to biobank-size ultra-high-dimensional data.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] A BAYESIAN GRAPHICAL MODEL FOR GENOME-WIDE ASSOCIATION STUDIES (GWAS)
    Briollais, Laurent
    Dobra, Adrian
    Liu, Jinnan
    Friedlander, Matt
    Ozcelik, Hilmi
    Massam, Helene
    ANNALS OF APPLIED STATISTICS, 2016, 10 (02) : 786 - 811
  • [22] Multiple SNP Set Analysis for Genome-Wide Association Studies Through Bayesian Latent Variable Selection
    Lu, Zhao-Hua
    Zhu, Hongtu
    Knickmeyer, Rebecca C.
    Sullivan, Patrick F.
    Williams, Stephanie N.
    Zou, Fei
    GENETIC EPIDEMIOLOGY, 2015, 39 (08) : 664 - 677
  • [23] Large-Scale Genome-Wide Study of Income Highlights Heterogenous Pleiotropy Across the Genome
    Kweon, Hyeokmoon
    Burik, Casper A. P.
    Ahlskog, Rafael
    Okbay, Aysu
    Linner, Richard Karlsson
    de Vlaming, Ronald
    Benjamin, Daniel J.
    DiPrete, Thomas A.
    Koellinger, Philipp D.
    BEHAVIOR GENETICS, 2022, 52 (06) : 371 - 371
  • [24] HYPOTHESIS TESTING IN LARGE-SCALE FUNCTIONAL LINEAR REGRESSION
    Xue, Kaijie
    Yao, Fang
    STATISTICA SINICA, 2021, 31 (02) : 1101 - 1123
  • [25] On the hypothesis-free testing of metabolite ratios in genome-wide and metabolome-wide association studies
    Ann-Kristin Petersen
    Jan Krumsiek
    Brigitte Wägele
    Fabian J Theis
    H-Erich Wichmann
    Christian Gieger
    Karsten Suhre
    BMC Bioinformatics, 13
  • [26] On the hypothesis-free testing of metabolite ratios in genome-wide and metabolome-wide association studies
    Petersen, Ann-Kristin
    Krumsiek, Jan
    Waegele, Brigitte
    Theis, Fabian J.
    Wichmann, H-Erich
    Gieger, Christian
    Suhre, Karsten
    BMC BIOINFORMATICS, 2012, 13
  • [27] Applying compressed sensing to genome-wide association studies
    Vattikuti, Shashaank
    Lee, James J.
    Chang, Christopher C.
    Hsu, Stephen D. H.
    Chow, Carson C.
    GIGASCIENCE, 2014, 3
  • [28] Estimation of a significance threshold for genome-wide association studies
    Kaler, Avjinder S.
    Purcell, Larry C.
    BMC GENOMICS, 2019, 20 (1)
  • [29] Multivariate genome-wide association analysis by iterative hard thresholding
    Chu, Benjamin B.
    Ko, Seyoon
    Zhou, Jin J.
    Jensen, Aubrey
    Zhou, Hua
    Sinsheimer, Janet S.
    Lange, Kenneth
    BIOINFORMATICS, 2023, 39 (04)
  • [30] REPLICABILITY ANALYSIS FOR GENOME-WIDE ASSOCIATION STUDIES
    Heller, Ruth
    Yekutieli, Daniel
    ANNALS OF APPLIED STATISTICS, 2014, 8 (01) : 481 - 498