Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes

被引:113
作者
Abdollahi-Arpanahi, Rostam L. [1 ]
Gianola, Daniel [2 ]
Penagaricano, Francisco [1 ,3 ]
机构
[1] Univ Florida, Dept Anim Sci, Gainesville, FL 32611 USA
[2] Univ Wisconsin, Dept Anim Sci & Dairy Sci, Madison, WI USA
[3] Univ Florida, Genet Inst, Gainesville, FL 32611 USA
关键词
HILBERT-SPACES REGRESSION; GENETIC ARCHITECTURE; ASSISTED PREDICTION; TRAITS;
D O I
10.1186/s12711-020-00531-z
中图分类号
S8 [畜牧、 动物医学、狩猎、蚕、蜂];
学科分类号
0905 ;
摘要
Background Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning algorithms such as multilayer perceptrons (MLP) and convolutional neural networks (CNN). The aim of this study was to compare the predictive performance of two deep learning methods (MLP and CNN), two ensemble learning methods [random forests (RF) and gradient boosting (GB)], and two parametric methods [genomic best linear unbiased prediction (GBLUP) and Bayes B] using real and simulated datasets. Methods The real dataset consisted of 11,790 Holstein bulls with sire conception rate (SCR) records and genotyped for 58k single nucleotide polymorphisms (SNPs). To support the evaluation of deep learning methods, various simulation studies were conducted using the observed genotype data as template, assuming a heritability of 0.30 with either additive or non-additive gene effects, and two different numbers of quantitative trait nucleotides (100 and 1000). Results In the bull dataset, the best predictive correlation was obtained with GB (0.36), followed by Bayes B (0.34), GBLUP (0.33), RF (0.32), CNN (0.29) and MLP (0.26). The same trend was observed when using mean squared error of prediction. The simulation indicated that when gene action was purely additive, parametric methods outperformed other methods. When the gene action was a combination of additive, dominance and of two-locus epistasis, the best predictive ability was obtained with gradient boosting, and the superiority of deep learning over the parametric methods depended on the number of loci controlling the trait and on sample size. In fact, with a large dataset including 80k individuals, the predictive performance of deep learning methods was similar or slightly better than that of parametric methods for traits with non-additive gene action. Conclusions For prediction of traits with non-additive gene action, gradient boosting was a robust method. Deep learning approaches were not better for genomic prediction unless non-additive variance was sizable.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Predicting bull fertility using genomic data and biological information
    Abdollahi-Arpanahi, Rostam
    Morota, Gota
    Penagaricano, Francisco
    [J]. JOURNAL OF DAIRY SCIENCE, 2017, 100 (12) : 9656 - 9666
  • [2] [Anonymous], 2016, THESIS
  • [3] [Anonymous], R LANG ENV STAT COMP
  • [4] Can Deep Learning Improve Genomic Prediction of Complex Human Traits?
    Bellot, Pau
    de los Campos, Gustavo
    Perez-Enciso, Miguel
    [J]. GENETICS, 2018, 210 (03) : 809 - 819
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] Breiman L., 2001, RANDOM FORESTS, V45, P5, DOI DOI 10.1023/A:1010933404324
  • [7] Second-generation PLINK: rising to the challenge of larger and richer datasets
    Chang, Christopher C.
    Chow, Carson C.
    Tellier, Laurent C. A. M.
    Vattikuti, Shashaank
    Purcell, Shaun M.
    Lee, James J.
    [J]. GIGASCIENCE, 2015, 4
  • [8] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [9] High Prevalence of Assisted Injection Among Street-Involved Youth in a Canadian Setting
    Cheng, Tessa
    Kerr, Thomas
    Small, Will
    Dong, Huiru
    Montaner, Julio
    Wood, Evan
    DeBeck, Kora
    [J]. AIDS AND BEHAVIOR, 2016, 20 (02) : 377 - 384
  • [10] Reproducing kernel Hilbert spaces regression: A general framework for genetic evaluation
    de los Campos, G.
    Gianola, D.
    Rosa, G. J. M.
    [J]. JOURNAL OF ANIMAL SCIENCE, 2009, 87 (06) : 1883 - 1887