Improving GWAS discovery and genomic prediction accuracy in biobank data

被引:12
作者
Orliac, Etienne J. [1 ]
Banos, Daniel Trejo [2 ]
Ojavee, Sven E. [3 ]
Lall, Kristi [4 ]
Magi, Reedik [4 ]
Visscher, Peter M. [5 ]
Robinson, Matthew R. [6 ]
机构
[1] Univ Lausanne, Sci Comp & Res Support Unit, CH-1015 Lausanne, Switzerland
[2] Univ Zurich, Dept Quantitat Biomed, CH-8057 Zurich, Switzerland
[3] Univ Lausanne, Dept Computat Biol, CH-1015 Lausanne, Switzerland
[4] Univ Tartu, Inst Genom, Estonian Genome Ctr, EE-51010 Tartu, Estonia
[5] Univ Queensland, Inst Mol Biosci, Brisbane, Qld 4072, Australia
[6] IST Austria, A-3400 Klosterneuburg, Austria
基金
瑞士国家科学基金会; 澳大利亚研究理事会; 英国医学研究理事会;
关键词
genomic prediction; association study; Bayesian penalized regression; RESOURCE;
D O I
10.1073/pnas.2121279119
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Genetically informed, deep-phenotyped biobanks are an important research resource and it is imperative that the most powerful, versatile, and efficient analysis approaches are used. Here, we apply our recently developed Bayesian grouped mixture of regressions model (GMRM) in the UK and Estonian Biobanks and obtain the highest genomic prediction accuracy reported to date across 21 heritable traits. When compared to other approaches, GMRM accuracy was greater than annotation prediction models run in the LDAK or LDPred-funct software by 15% (SE 7%) and 14% (SE 2%), respectively, and was 18% (SE 3%) greater than a baseline BayesR model without single-nucleotide polymorphism (SNP) markers grouped into minor allele frequency-linkage disequilibrium (MAF-LD) annotation categories. For height, the prediction accuracy R-2 was 47% in a UK Biobank holdout sample, which was 76% of the estimated h(2) SNP. We then extend our GMRM prediction model to provide mixed-linear model association (MLMA) SNP marker estimates for genome-wide association (GWAS) discovery, which increased the independent loci detected to 16,162 in unrelated UK Biobank individuals, compared to 10,550 from BoltLMM and 10,095 from Regenie, a 62 and 65% increase, respectively. Theaverage chi(2) value of the leading markers increased by 15.24 (SE 0.41) for every 1% increase in prediction accuracy gained over a baseline BayesR model across the traits. Thus, we show that modeling genetic associations accounting for MAF and LD differences among SNP markers, and incorporating prior knowledge of genomic function, is important for both genomic prediction and discovery in large-scale individual-level studies.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Genomic prediction using pooled data in a single-step genomic best linear unbiased prediction framework
    Baller, Johnna L.
    Kachman, Stephen D.
    Kuehn, Larry A.
    Spangler, Matthew L.
    JOURNAL OF ANIMAL SCIENCE, 2020, 98 (06)
  • [42] Accuracy of genomic breeding values revisited: Assessment of two established approaches and a novel one to determine the accuracy in two-step genomic prediction
    Ni, G.
    Kipp, S.
    Simianer, H.
    Erbe, M.
    JOURNAL OF ANIMAL BREEDING AND GENETICS, 2017, 134 (03) : 242 - 255
  • [43] GWAS-assisted genomic prediction of cadmium accumulation in maize kernel with machine learning and linear statistical methods
    Yan, Huili
    Guo, Hanyao
    Xu, Wenxiu
    Dai, Changhua
    Kimani, Wilson
    Xie, Jianyin
    Zhang, Hezifan
    Li, Ting
    Wang, Feng
    Yu, Yijun
    Ma, Mi
    Hao, Zhuanfang
    He, Zhenyan
    JOURNAL OF HAZARDOUS MATERIALS, 2023, 441
  • [44] Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions
    T Druet
    I M Macleod
    B J Hayes
    Heredity, 2014, 112 : 39 - 47
  • [45] Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions
    Druet, T.
    Macleod, I. M.
    Hayes, B. J.
    HEREDITY, 2014, 112 (01) : 39 - 47
  • [46] Forecasting the accuracy of genomic prediction with different selection targets in the training and prediction set as well as truncation selection
    Pascal Schopp
    Christian Riedelsheimer
    H. Friedrich Utz
    Chris-Carolin Schön
    Albrecht E. Melchinger
    Theoretical and Applied Genetics, 2015, 128 : 2189 - 2201
  • [47] Genomic Prediction Within and Across Biparental Families: Means and Variances of Prediction Accuracy and Usefulness of Deterministic Equations
    Schopp, Pascal
    Mueller, Dominik
    Wientjes, Yvonne C. J.
    Melchinger, Albrecht E.
    G3-GENES GENOMES GENETICS, 2017, 7 (11): : 3571 - 3586
  • [48] Systematic bias of correlation coefficient may explain negative accuracy of genomic prediction
    Zhou, Yao
    Vales, M. Isabel
    Wang, Aoxue
    Zhang, Zhiwu
    BRIEFINGS IN BIOINFORMATICS, 2017, 18 (05) : 744 - 753
  • [49] The impact of genotyping strategies and statistical models on accuracy of genomic prediction for survival in pigs
    Tianfei Liu
    Bjarne Nielsen
    Ole F. Christensen
    Mogens Sandø Lund
    Guosheng Su
    Journal of Animal Science and Biotechnology, 14
  • [50] Using machine learning to improve the accuracy of genomic prediction of reproduction traits in pigs
    Wang, Xue
    Shi, Shaolei
    Wang, Guijiang
    Luo, Wenxue
    Wei, Xia
    Qiu, Ao
    Luo, Fei
    Ding, Xiangdong
    JOURNAL OF ANIMAL SCIENCE AND BIOTECHNOLOGY, 2022, 13 (01)