Reverse GWAS: Using genetics to identify and model phenotypic subtypes

被引:29
作者
Dahl, Andy [1 ]
Cai, Na [2 ,3 ]
Ko, Arthur [4 ]
Laakso, Markku [5 ,6 ]
Pajukanta, Paivi [4 ]
Flint, Jonathan [7 ]
Zaitlen, Noah [1 ]
机构
[1] UCSF, Dept Med, San Francisco, CA 94143 USA
[2] Wellcome Sanger Inst, Cambridge, England
[3] European Bioinformat Inst EMBL EBI, Cambridge, England
[4] UCLA, David Geffen Sch Med, Dept Human Genet, Los Angeles, CA 90095 USA
[5] Univ Eastern Finland, Inst Clin Med, Internal Med, Kuopio, Finland
[6] Kuopio Univ Hosp, Kuopio, Finland
[7] UCLA, Semel Inst Neurosci & Human Behav, Ctr Neurobehav Genet, Los Angeles, CA 90024 USA
基金
美国国家卫生研究院;
关键词
RISK PREDICTION; HETEROGENEITY; ASSOCIATION; HERITABILITY; VARIANTS; IDENTIFICATION; THERAPY; GENOME; EXPRESSION; DISORDER;
D O I
10.1371/journal.pgen.1008009
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Recent and classical work has revealed biologically and medically significant subtypes in complex diseases and traits. However, relevant subtypes are often unknown, unmeasured, or actively debated, making automated statistical approaches to subtype definition valuable. We propose reverse GWAS (RGWAS) to identify and validate subtypes using genetics and multiple traits: while GWAS seeks the genetic basis of a given trait, RGWAS seeks to define trait subtypes with distinct genetic bases. Unlike existing approaches relying on off-the-shelf clustering methods, RGWAS uses a novel decomposition, MFMR, to model covariates, binary traits, and population structure. We use extensive simulations to show that modelling these features can be crucial for power and calibration. We validate RGWAS in practice by recovering a recently discovered stress subtype in major depression. We then show the utility of RGWAS by identifying three novel subtypes of metabolic traits. We biologically validate these metabolic subtypes with SNP-level tests and a novel polygenic test: the former recover known metabolic GxE SNPs; the latter suggests subtypes may explain substantial missing heritability. Crucially, statins, which are widely prescribed and theorized to increase diabetes risk, have opposing effects on blood glucose across metabolic subtypes, suggesting the subtypes have potential translational value. Author summary Complex diseases depend on interactions between many known and unknown genetic and environmental factors. However, most studies aggregate these strata and test for associations on average across samples, though biological factors and medical interventions can have dramatically different effects on different people. Further, more-sophisticated models are often infeasible because relevant sources of heterogeneity are not generally known a priori. We introduce Reverse GWAS to simultaneously split samples into homogeneous subtypes and to learn differences in genetic or treatment effects between subtypes. Unlike existing approaches to computational subtype identification from high-dimensional trait data, RGWAS accounts for covariates, binary disease traits and, especially, population structure-important features of real genetic datasets. We validate RGWAS by recovering known genetic subtypes of major depression. We demonstrate RGWAS can uncover useful novel subtypes in a metabolic dataset, finding three novel subtypes with both SNP- and polygenic-level heterogeneity. Importantly, we show that RGWAS can uncover subtypes with differential treatment response: we show that statin, a common drug and potential type 2 diabetes risk factor, may have opposing subtype-specific effects on blood glucose.
引用
收藏
页数:22
相关论文
共 76 条
[31]   Exploiting gene-environment interaction to detect genetic associations [J].
Kraft, Peter ;
Yen, Yu-Chun ;
Stram, Daniel O. ;
Morrison, John ;
Gauderman, W. James .
HUMAN HEREDITY, 2007, 63 (02) :111-119
[32]   Machine learning shows association between genetic variability in PPARG and cerebral connectivity in preterm infants [J].
Krishnan, Michelle L. ;
Wang, Zi ;
Aljabar, Paul ;
Ball, Gareth ;
Mirza, Ghazala ;
Saxena, Alka ;
Counsell, Serena J. ;
Hajnal, Joseph V. ;
Montana, Giovanni ;
Edwards, A. David .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (52) :13744-13749
[33]   The Metabolic Syndrome in Men study: a resource for studies of metabolic and cardiovascular diseases [J].
Laakso, Markku ;
Kuusisto, Johanna ;
Stancakova, Alena ;
Kuulasmaa, Teemu ;
Pajukanta, Paivi ;
Lusis, Aldons J. ;
Collins, Francis S. ;
Mohlke, Karen L. ;
Boehnke, Michael .
JOURNAL OF LIPID RESEARCH, 2017, 58 (03) :481-493
[34]   Common Genetic Variants Modulate Pathogen-Sensing Responses in Human Dendritic Cells [J].
Lee, Mark N. ;
Ye, Chun ;
Villani, Alexandra-Chloe ;
Raj, Towfique ;
Li, Weibo ;
Eisenhaure, Thomas M. ;
Imboywa, Selina H. ;
Chipendo, Portia I. ;
Ran, F. Ann ;
Slowikowski, Kamil ;
Ward, Lucas D. ;
Raddassi, Khadir ;
McCabe, Cristin ;
Lee, Michelle H. ;
Frohlich, Irene Y. ;
Hafler, David A. ;
Kellis, Manolis ;
Raychaudhuri, Soumya ;
Zhang, Feng ;
Stranger, Barbara E. ;
Benoist, Christophe O. ;
De Jager, Philip L. ;
Regev, Aviv ;
Hacohen, Nir .
SCIENCE, 2014, 343 (6175) :1119-+
[35]   Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs [J].
Lee, S. Hong ;
Ripke, Stephan ;
Neale, Benjamin M. ;
Faraone, Stephen V. ;
Purcell, Shaun M. ;
Perlis, Roy H. ;
Mowry, Bryan J. ;
Thapar, Anita ;
Goddard, Michael E. ;
Witte, John S. ;
Absher, Devin ;
Agartz, Ingrid ;
Akil, Huda ;
Amin, Farooq ;
Andreassen, Ole A. ;
Anjorin, Adebayo ;
Anney, Richard ;
Anttila, Verneri ;
Arking, Dan E. ;
Asherson, Philip ;
Azevedo, Maria H. ;
Backlund, Lena ;
Badner, Judith A. ;
Bailey, Anthony J. ;
Banaschewski, Tobias ;
Barchas, Jack D. ;
Barnes, Michael R. ;
Barrett, Thomas B. ;
Bass, Nicholas ;
Battaglia, Agatino ;
Bauer, Michael ;
Bayes, Monica ;
Bellivier, Frank ;
Bergen, Sarah E. ;
Berrettini, Wade ;
Betancur, Catalina ;
Bettecken, Thomas ;
Biederman, Joseph ;
Binder, Elisabeth B. ;
Black, Donald W. ;
Blackwood, Douglas H. R. ;
Bloss, Cinnamon S. ;
Boehnke, Michael ;
Boomsma, Dorret I. ;
Breen, Gerome ;
Breuer, Rene ;
Bruggeman, Richard ;
Cormican, Paul ;
Buccola, Nancy G. ;
Buitelaar, Jan K. .
NATURE GENETICS, 2013, 45 (09) :984-+
[36]  
Leisch F., 2004, Journal of Statistical Software, V11, P1, DOI 10.18637/jss.v011.i08
[37]   Identification of type 2 diabetes subgroups through topological analysis of patient similarity [J].
Li, Li ;
Cheng, Wei-Yi ;
Glicksberg, Benjamin S. ;
Gottesman, Omri ;
Tamler, Ronald ;
Chen, Rong ;
Bottinger, Erwin P. ;
Dudley, Joel T. .
SCIENCE TRANSLATIONAL MEDICINE, 2015, 7 (311)
[38]   The impact of rare variation on gene expression across tissues [J].
Li, Xin ;
Kim, Yungil ;
Sang, Emily K. T. ;
Davis, Joe R. . ;
Damani, Farhan N. ;
Hiang, Colby C. ;
Hess, Gaelen T. . ;
Zappala, Zachary ;
Strober, Benjamin J. ;
Scott, Alexandra J. ;
Li, Amy ;
Ganna, Andrea ;
Assik, Michael C. . B. ;
Merker, Jason D. ;
Hall, Ira M. ;
Attle, Alexis B. ;
Montgomery, Stephen B. . .
NATURE, 2017, 550 (7675) :239-+
[39]   A method for identifying genetic heterogeneity within phenotypically defined disease subgroups [J].
Liley, James ;
Todd, John A. ;
Wallace, Chris .
NATURE GENETICS, 2017, 49 (02) :310-316
[40]   Simultaneous dimension reduction and adjustment for confounding variation [J].
Lin, Zhixiang ;
Yang, Can ;
Zhu, Ying ;
Duchi, John ;
Fu, Yao ;
Wang, Yong ;
Jiang, Bai ;
Zamanighomi, Mahdi ;
Xu, Xuming ;
Li, Mingfeng ;
Sestan, Nenad ;
Zhao, Hongyu ;
Wong, Wing Hung .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (51) :14662-14667