Integrative genetic risk prediction using non-parametric empirical Bayes classification

被引:5
作者
Zhao, Sihai Dave [1 ]
机构
[1] Univ Illinois, Dept Stat, Champaign, IL 61820 USA
关键词
Empirical Bayes; Genetic risk prediction; GWAS; High-dimensional classification; Integrative genomics; Non-parametric maximum likelihood; GENOME-WIDE ASSOCIATION; HIGH-DIMENSIONAL CLASSIFICATION; INCREASES ACCURACY; COMMON; SUSCEPTIBILITY; SCHIZOPHRENIA; ARCHITECTURE; METAANALYSIS; DISORDER; VARIANT;
D O I
10.1111/biom.12619
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Genetic risk prediction is an important component of individualized medicine, but prediction accuracies remain low for many complex diseases. A fundamental limitation is the sample sizes of the studies on which the prediction algorithms are trained. One way to increase the effective sample size is to integrate information from previously existing studies. However, it can be difficult to find existing data that examine the target disease of interest, especially if that disease is rare or poorly studied. Furthermore, individual-level genotype data from these auxiliary studies are typically difficult to obtain. This article proposes a new approach to integrative genetic risk prediction of complex diseases with binary phenotypes. It accommodates possible heterogeneity in the genetic etiologies of the target and auxiliary diseases using a tuning parameter-free non-parametric empirical Bayes procedure, and can be trained using only auxiliary summary statistics. Simulation studies show that the proposed method can provide superior predictive accuracy relative to non-integrative as well as integrative classifiers. The method is applied to a recent study of pediatric autoimmune diseases, where it substantially reduces prediction error for certain target/auxiliary disease combinations. The proposed method is implemented in the R package ssa.
引用
收藏
页码:582 / 592
页数:11
相关论文
共 62 条
  • [11] Fan JQ, 2012, J ROY STAT SOC B, V74, P745, DOI 10.1111/j.1467-9868.2012.01029.x
  • [12] HIGH-DIMENSIONAL CLASSIFICATION USING FEATURES ANNEALED INDEPENDENCE RULES
    Fan, Jianqing
    Fan, Yingying
    [J]. ANNALS OF STATISTICS, 2008, 36 (06) : 2605 - 2637
  • [13] OPTIMAL CLASSIFICATION IN SPARSE GAUSSIAN GRAPHIC MODEL
    Fan, Yingying
    Jin, Jiashun
    Yao, Zhigang
    [J]. ANNALS OF STATISTICS, 2013, 41 (05) : 2537 - 2571
  • [14] Genome-wide association analysis identifies three new susceptibility loci for childhood body mass index
    Felix, Janine F.
    Bradfield, Jonathan P.
    Monnereau, Claire
    van der Valk, Ralf J. P.
    Stergiakouli, Evie
    Chesi, Alessandra
    Gaillard, Romy
    Feenstra, Bjarke
    Thiering, Elisabeth
    Kreiner-Moller, Eskil
    Mahajan, Anubha
    Pitkanen, Niina
    Joro, Raimo
    Cavadino, Alana
    Huikari, Ville
    Franks, Steve
    Groen-Blokhuis, Maria M.
    Cousminer, Diana L.
    Marsh, Julie A.
    Lehtimaki, Terho
    Curtin, John A.
    Vioque, Jesus
    Ahluwalia, Tarunveer S.
    Myhre, Ronny
    Price, Thomas S.
    Vilor-Tejedor, Natalia
    Yengo, Loic
    Grarup, Niels
    Ntalla, Ioanna
    Ang, Wei
    Atalay, Mustafa
    Bisgaard, Hans
    Blakemore, Alexandra I.
    Bonnefond, Amelie
    Carstensen, Lisbeth
    Eriksson, Johan
    Flexeder, Claudia
    Franke, Lude
    Geller, Frank
    Geserick, Mandy
    Hartikainen, Anna-Liisa
    Haworth, Claire M. A.
    Hirschhorn, Joel N.
    Hofman, Albert
    Holm, Jens-Christian
    Horikoshi, Momoko
    Hottenga, Jouke Jan
    Huang, Jinyan
    Kadarmideen, Haja N.
    Kahonen, Mika
    [J]. HUMAN MOLECULAR GENETICS, 2016, 25 (02) : 389 - 403
  • [15] Feng L., 2016, ARXIV160602011
  • [16] A second generation human haplotype map of over 3.1 million SNPs
    Frazer, Kelly A.
    Ballinger, Dennis G.
    Cox, David R.
    Hinds, David A.
    Stuve, Laura L.
    Gibbs, Richard A.
    Belmont, John W.
    Boudreau, Andrew
    Hardenbol, Paul
    Leal, Suzanne M.
    Pasternak, Shiran
    Wheeler, David A.
    Willis, Thomas D.
    Yu, Fuli
    Yang, Huanming
    Zeng, Changqing
    Gao, Yang
    Hu, Haoran
    Hu, Weitao
    Li, Chaohua
    Lin, Wei
    Liu, Siqi
    Pan, Hao
    Tang, Xiaoli
    Wang, Jian
    Wang, Wei
    Yu, Jun
    Zhang, Bo
    Zhang, Qingrun
    Zhao, Hongbin
    Zhao, Hui
    Zhou, Jun
    Gabriel, Stacey B.
    Barry, Rachel
    Blumenstiel, Brendan
    Camargo, Amy
    Defelice, Matthew
    Faggart, Maura
    Goyette, Mary
    Gupta, Supriya
    Moore, Jamie
    Nguyen, Huy
    Onofrio, Robert C.
    Parkin, Melissa
    Roy, Jessica
    Stahl, Erich
    Winchester, Ellen
    Ziaugra, Liuda
    Altshuler, David
    Shen, Yan
    [J]. NATURE, 2007, 449 (7164) : 851 - U3
  • [17] Effective Genetic-Risk Prediction Using Mixed Models
    Golan, David
    Rosset, Saharon
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2014, 95 (04) : 383 - 393
  • [18] Greenshtein E, 2009, J MACH LEARN RES, V10, P1687
  • [19] Gu J., 2016, J BUSINESS EC STAT
  • [20] On a Problem of Robbins
    Gu, Jiaying
    Koenker, Roger
    [J]. INTERNATIONAL STATISTICAL REVIEW, 2016, 84 (02) : 224 - 244