Bayesian hierarchical hypothesis testing in large-scale genome-wide association analysis

被引：0

作者：

Samaddar, Anirban ^{[1
]}

Maiti, Tapabrata ^{[1
]}

de los Campos, Gustavo ^{[1
,2
,3
]}

机构：

[1] Michigan State Univ, Dept Stat & Probabil, E Lansing, MI 48824 USA

[2] Michigan State Univ, Dept Epidemiol & Biostat, E Lansing, MI 48824 USA

[3] Michigan State Univ, Inst Quantitat Hlth Sci & Engn, E Lansing, MI 48824 USA

来源：

GENETICS | 2024年 / 228卷 / 04期

关键词：

Bayesian variable selection; Bayesian hierarchical hypothesis testing; false discovery rate; GWAS; collinearity; multiresolution inference; spike and slab prior; linkage disequilibrium; UK-Biobank data; FALSE DISCOVERY RATE; VARIABLE-SELECTION; REGRESSION; HERITABILITY; PREDICTION;

D O I：

10.1093/genetics/iyae164

中图分类号：

Q3 [遗传学];

学科分类号：

071007 ; 090102 ;

摘要：

Variable selection and large-scale hypothesis testing are techniques commonly used to analyze high-dimensional genomic data. Despite recent advances in theory and methodology, variable selection and inference with highly collinear features remain challenging. For instance, collinearity poses a great challenge in genome-wide association studies involving millions of variants, many of which may be in high linkage disequilibrium. In such settings, collinearity can significantly reduce the power of variable selection methods to identify individual variants associated with an outcome. To address such challenges, we developed a Bayesian hierarchical hypothesis testing (BHHT)-a novel multiresolution testing procedure that offers high power with adequate error control and fine-mapping resolution. We demonstrate through simulations that the proposed methodology has a power-FDR performance that is competitive with (and in many scenarios better than) state-of-the-art methods. Finally, we demonstrate the feasibility of using BHHT with large sample size ( n similar to 300,000) and ultra dimensional genotypes (similar to 15 million single-nucleotide polymorphisms or SNPs) by applying it to eight complex traits using data from the UK-Biobank. Our results show that the proposed methodology leads to many more discoveries than those obtained using traditional SNP-centered inference procedures. The article is accompanied by open-source software that implements the methods described in this study using algorithms that scale to biobank-size ultra-high-dimensional data.

引用

页数：12

共 50 条

[21] A BAYESIAN GRAPHICAL MODEL FOR GENOME-WIDE ASSOCIATION STUDIES (GWAS)
Briollais, Laurent
Dobra, Adrian
Liu, Jinnan
Friedlander, Matt
Ozcelik, Hilmi
Massam, Helene
ANNALS OF APPLIED STATISTICS, 2016, 10 (02) : 786 - 811
[22] Multiple SNP Set Analysis for Genome-Wide Association Studies Through Bayesian Latent Variable Selection
Lu, Zhao-Hua
Zhu, Hongtu
Knickmeyer, Rebecca C.
Sullivan, Patrick F.
Williams, Stephanie N.
Zou, Fei
GENETIC EPIDEMIOLOGY, 2015, 39 (08) : 664 - 677
[23] Large-Scale Genome-Wide Study of Income Highlights Heterogenous Pleiotropy Across the Genome
Kweon, Hyeokmoon
Burik, Casper A. P.
Ahlskog, Rafael
Okbay, Aysu
Linner, Richard Karlsson
de Vlaming, Ronald
Benjamin, Daniel J.
DiPrete, Thomas A.
Koellinger, Philipp D.
BEHAVIOR GENETICS, 2022, 52 (06) : 371 - 371
[24] HYPOTHESIS TESTING IN LARGE-SCALE FUNCTIONAL LINEAR REGRESSION
Xue, Kaijie
Yao, Fang
STATISTICA SINICA, 2021, 31 (02) : 1101 - 1123
[25] On the hypothesis-free testing of metabolite ratios in genome-wide and metabolome-wide association studies
Ann-Kristin Petersen
Jan Krumsiek
Brigitte Wägele
Fabian J Theis
H-Erich Wichmann
Christian Gieger
Karsten Suhre
BMC Bioinformatics, 13
[26] On the hypothesis-free testing of metabolite ratios in genome-wide and metabolome-wide association studies
Petersen, Ann-Kristin
Krumsiek, Jan
Waegele, Brigitte
Theis, Fabian J.
Wichmann, H-Erich
Gieger, Christian
Suhre, Karsten
BMC BIOINFORMATICS, 2012, 13
[27] Applying compressed sensing to genome-wide association studies
Vattikuti, Shashaank
Lee, James J.
Chang, Christopher C.
Hsu, Stephen D. H.
Chow, Carson C.
GIGASCIENCE, 2014, 3
[28] Estimation of a significance threshold for genome-wide association studies
Kaler, Avjinder S.
Purcell, Larry C.
BMC GENOMICS, 2019, 20 (1)
[29] Multivariate genome-wide association analysis by iterative hard thresholding
Chu, Benjamin B.
Ko, Seyoon
Zhou, Jin J.
Jensen, Aubrey
Zhou, Hua
Sinsheimer, Janet S.
Lange, Kenneth
BIOINFORMATICS, 2023, 39 (04)
[30] REPLICABILITY ANALYSIS FOR GENOME-WIDE ASSOCIATION STUDIES
Heller, Ruth
Yekutieli, Daniel
ANNALS OF APPLIED STATISTICS, 2014, 8 (01) : 481 - 498

← 1 2 3 4 5 →