Integrative genetic risk prediction using non-parametric empirical Bayes classification

被引:5
作者
Zhao, Sihai Dave [1 ]
机构
[1] Univ Illinois, Dept Stat, Champaign, IL 61820 USA
关键词
Empirical Bayes; Genetic risk prediction; GWAS; High-dimensional classification; Integrative genomics; Non-parametric maximum likelihood; GENOME-WIDE ASSOCIATION; HIGH-DIMENSIONAL CLASSIFICATION; INCREASES ACCURACY; COMMON; SUSCEPTIBILITY; SCHIZOPHRENIA; ARCHITECTURE; METAANALYSIS; DISORDER; VARIANT;
D O I
10.1111/biom.12619
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Genetic risk prediction is an important component of individualized medicine, but prediction accuracies remain low for many complex diseases. A fundamental limitation is the sample sizes of the studies on which the prediction algorithms are trained. One way to increase the effective sample size is to integrate information from previously existing studies. However, it can be difficult to find existing data that examine the target disease of interest, especially if that disease is rare or poorly studied. Furthermore, individual-level genotype data from these auxiliary studies are typically difficult to obtain. This article proposes a new approach to integrative genetic risk prediction of complex diseases with binary phenotypes. It accommodates possible heterogeneity in the genetic etiologies of the target and auxiliary diseases using a tuning parameter-free non-parametric empirical Bayes procedure, and can be trained using only auxiliary summary statistics. Simulation studies show that the proposed method can provide superior predictive accuracy relative to non-integrative as well as integrative classifiers. The method is applied to a recent study of pediatric autoimmune diseases, where it substantially reduces prediction error for certain target/auxiliary disease combinations. The proposed method is implemented in the R package ssa.
引用
收藏
页码:582 / 592
页数:11
相关论文
共 62 条
  • [1] Some theory for Fisher's linear discriminant function, 'naive Bayes', and some alternatives when there are many more variables than observations
    Bickel, PJ
    Levina, E
    [J]. BERNOULLI, 2004, 10 (06) : 989 - 1010
  • [2] A genome-wide association meta-analysis identifies new childhood obesity loci
    Bradfield, Jonathan P.
    Taal, H. Rob
    Timpson, Nicholas J.
    Scherag, Andre
    Lecoeur, Cecile
    Warrington, Nicole M.
    Hypponen, Elina
    Holst, Claus
    Valcarcel, Beatriz
    Thiering, Elisabeth
    Salem, Rany M.
    Schumacher, Fredrick R.
    Cousminer, Diana L.
    Sleiman, Patrick M. A.
    Zhao, Jianhua
    Berkowitz, Robert I.
    Vimaleswaran, Karani S.
    Jarick, Ivonne
    Pennell, Craig E.
    Evans, David M.
    St Pourcain, Beate
    Berry, Diane J.
    Mook-Kanamori, Dennis O.
    Hofman, Albert
    Rivadeneira, Fernando
    Uitterlinden, Andre G.
    van Duijn, Cornelia M.
    van der Valk, Ralf J. P.
    de Jongste, Johan C.
    Postma, Dirkje S.
    Boomsma, Dorret I.
    Gauderman, W. James
    Hassanein, Mohamed T.
    Lindgren, Cecilia M.
    Magi, Reedik
    Boreham, Colin A. G.
    Neville, Charlotte E.
    Moreno, Luis A.
    Elliott, Paul
    Pouta, Anneli
    Hartikainen, Anna-Liisa
    Li, Mingyao
    Raitakari, Olli
    Lehtimaki, Terho
    Eriksson, Johan G.
    Palotie, Aarno
    Dallongeville, Jean
    Das, Shikta
    Deloukas, Panos
    McMahon, George
    [J]. NATURE GENETICS, 2012, 44 (05) : 526 - +
  • [3] NONPARAMETRIC EMPIRICAL BAYES AND COMPOUND DECISION APPROACHES TO ESTIMATION OF A HIGH-DIMENSIONAL VECTOR OF NORMAL MEANS
    Brown, Lawrence D.
    Greenshtein, Eitan
    [J]. ANNALS OF STATISTICS, 2009, 37 (04) : 1685 - 1704
  • [4] Cai T., 2012, J AM STAT ASS, V106
  • [5] Developing and evaluating polygenic risk prediction models for stratified disease prevention
    Chatterjee, Nilanjan
    Shi, Jianxin
    Garcia-Closas, Montserrat
    [J]. NATURE REVIEWS GENETICS, 2016, 17 (07) : 392 - 406
  • [6] Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies
    Chatterjee, Nilanjan
    Wheeler, Bill
    Sampson, Joshua
    Hartge, Patricia
    Chanock, Stephen J.
    Park, Ju-Hyun
    [J]. NATURE GENETICS, 2013, 45 (04) : 400 - 405
  • [7] Use and misuse of the receiver operating characteristic curve in risk prediction
    Cook, Nancy R.
    [J]. CIRCULATION, 2007, 115 (07) : 928 - 935
  • [8] Devroye L., 2013, PROBABILISTIC THEORY, V31
  • [9] High-dimensional classification via nonparametric empirical Bayes and maximum likelihood inference
    Dicker, Lee H.
    Zhao, Sihai D.
    [J]. BIOMETRIKA, 2016, 103 (01) : 21 - 34
  • [10] Power and Predictive Accuracy of Polygenic Risk Scores
    Dudbridge, Frank
    [J]. PLOS GENETICS, 2013, 9 (03):