Can Deep Learning Improve Genomic Prediction of Complex Human Traits?

被引:141
作者
Bellot, Pau [1 ,5 ]
de los Campos, Gustavo [2 ,3 ]
Perez-Enciso, Miguel [1 ,4 ]
机构
[1] Univ Barcelona UB Consortium, UAB, CSIC, IRTA,CRAG, Barcelona 08193, Spain
[2] Michigan State Univ, Dept Epidemiol & Biostat, E Lansing, MI 48824 USA
[3] Michigan State Univ, Dept Stat, E Lansing, MI 48824 USA
[4] Inst Catala Recerca Avancada ICREA, Barcelona 08010, Spain
[5] Brainomix, 263 Banbury Rd, Oxford OX2 7HN, England
基金
美国国家卫生研究院;
关键词
Convolutional Neural Networks; complex traits; deep learning; genomic prediction; Multilayer Perceptrons; UK Biobank; whole-genome; Genomic Prediction regressions; GenPred; BONE-MINERAL DENSITY; ENABLED PREDICTION; REGRESSION; HERITABILITY;
D O I
10.1534/genetics.118.301298
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The genetic analysis of complex traits does not escape the current excitement around artificial intelligence, including a renewed interest in deep learning (DL) techniques such as Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). However, the performance of DL for genomic prediction of complex human traits has not been comprehensively tested. To provide an evaluation of MLPs and CNNs, we used data from distantly related white Caucasian individuals (n similar to 100k individuals, m similar to 500k SNPs, and k = 1000) of the interim release of the UK Biobank. We analyzed a total of five phenotypes: height, bone heel mineral density, body mass index, systolic blood pressure, and waist-hip ratio, with genomic heritabilities ranging from similar to 0.20 to 0.70. After hyperparameter optimization using a genetic algorithm, we considered several configurations, from shallow to deep learners, and compared the predictive performance of MLPs and CNNs with that of Bayesian linear regressions across sets of SNPs (from 10k to 50k) that were preselected using single-marker regression analyses. For height, a highly heritable phenotype, all methods performed similarly, although CNNs were slightly but consistently worse. For the rest of the phenotypes, the performance of some CNNs was comparable or slightly better than linear methods. Performance of MLPs was highly dependent on SNP set and phenotype. In all, over the range of traits evaluated in this study, CNN performance was competitive to linear models, but we did not find any case where DL outperformed the linear model by a sizable margin. We suggest that more research is needed to adapt CNN methodology, originally motivated by image analysis, to genetic-based problems in order for CNNs to be competitive with linear models.
引用
收藏
页码:809 / 819
页数:11
相关论文
共 44 条
[1]  
Abadi M., 2016, TENSORFLOW LARGESCAL
[2]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[3]  
[Anonymous], 2001, An Introduction to Genetic Algorithms. Complex Adaptive Systems
[4]   Second-generation PLINK: rising to the challenge of larger and richer datasets [J].
Chang, Christopher C. ;
Chow, Carson C. ;
Tellier, Laurent C. A. M. ;
Vattikuti, Shashaank ;
Purcell, Shaun M. ;
Lee, James J. .
GIGASCIENCE, 2015, 4
[5]  
Chollet F., 2015, Keras: Deep learning library for theano and tensorflow
[6]  
de Los Campos G., 2017, BGDATA SUITE PACKAGE
[7]   Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding [J].
de los Campos, Gustavo ;
Hickey, John M. ;
Pong-Wong, Ricardo ;
Daetwyler, Hans D. ;
Calus, Mario P. L. .
GENETICS, 2013, 193 (02) :327-+
[8]   Predicting genetic predisposition in humans: the promise of whole-genome markers [J].
de los Campos, Gustavo ;
Gianola, Daniel ;
Allison, David B. .
NATURE REVIEWS GENETICS, 2010, 11 (12) :880-886
[9]  
DONGARRA JJ, 1990, ACM T MATH SOFTWARE, V16, P1, DOI 10.1145/77626.79170
[10]   Influence of epistasis on response to genomic selection using complete sequence data [J].
Forneris, Natalia S. ;
Vitezica, Zulma G. ;
Legarra, Andres ;
Perez-Enciso, Miguel .
GENETICS SELECTION EVOLUTION, 2017, 49