Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes

被引：136

作者：

Abdollahi-Arpanahi, Rostam L. ^{[1
]}

Gianola, Daniel ^{[2
]}

Penagaricano, Francisco ^{[1
,3
]}

机构：

[1] Univ Florida, Dept Anim Sci, Gainesville, FL 32611 USA

[2] Univ Wisconsin, Dept Anim Sci & Dairy Sci, Madison, WI USA

[3] Univ Florida, Genet Inst, Gainesville, FL 32611 USA

来源：

GENETICS SELECTION EVOLUTION | 2020年 / 52卷 / 01期

关键词：

HILBERT-SPACES REGRESSION; GENETIC ARCHITECTURE; ASSISTED PREDICTION; TRAITS;

D O I：

10.1186/s12711-020-00531-z

中图分类号：

S8 [畜牧、动物医学、狩猎、蚕、蜂];

学科分类号：

0905 ;

摘要：

Background Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning algorithms such as multilayer perceptrons (MLP) and convolutional neural networks (CNN). The aim of this study was to compare the predictive performance of two deep learning methods (MLP and CNN), two ensemble learning methods [random forests (RF) and gradient boosting (GB)], and two parametric methods [genomic best linear unbiased prediction (GBLUP) and Bayes B] using real and simulated datasets. Methods The real dataset consisted of 11,790 Holstein bulls with sire conception rate (SCR) records and genotyped for 58k single nucleotide polymorphisms (SNPs). To support the evaluation of deep learning methods, various simulation studies were conducted using the observed genotype data as template, assuming a heritability of 0.30 with either additive or non-additive gene effects, and two different numbers of quantitative trait nucleotides (100 and 1000). Results In the bull dataset, the best predictive correlation was obtained with GB (0.36), followed by Bayes B (0.34), GBLUP (0.33), RF (0.32), CNN (0.29) and MLP (0.26). The same trend was observed when using mean squared error of prediction. The simulation indicated that when gene action was purely additive, parametric methods outperformed other methods. When the gene action was a combination of additive, dominance and of two-locus epistasis, the best predictive ability was obtained with gradient boosting, and the superiority of deep learning over the parametric methods depended on the number of loci controlling the trait and on sample size. In fact, with a large dataset including 80k individuals, the predictive performance of deep learning methods was similar or slightly better than that of parametric methods for traits with non-additive gene action. Conclusions For prediction of traits with non-additive gene action, gradient boosting was a robust method. Deep learning approaches were not better for genomic prediction unless non-additive variance was sizable.

引用

页数：15

共 50 条

[1] Predicting bull fertility using genomic data and biological information [J].

Abdollahi-Arpanahi, Rostam ;

Morota, Gota ;

Penagaricano, Francisco .

JOURNAL OF DAIRY SCIENCE, 2017, 100 (12) :9656-9666

[2]

[Anonymous], 1995, HDB BRAIN THEORY NEU

[3]

[Anonymous], 2016, THESIS

[4]

[Anonymous], 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, DOI DOI 10.1007/978

[5] Can Deep Learning Improve Genomic Prediction of Complex Human Traits? [J].

Bellot, Pau ;

de los Campos, Gustavo ;

Perez-Enciso, Miguel .

GENETICS, 2018, 210 (03) :809-819

[6] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[7]

Breiman L., 2001, IEEE Trans. Broadcast., V45, P5

[8] Second-generation PLINK: rising to the challenge of larger and richer datasets [J].

Chang, Christopher C. ;

Chow, Carson C. ;

Tellier, Laurent C. A. M. ;

Vattikuti, Shashaank ;

Purcell, Shaun M. ;

Lee, James J. .

GIGASCIENCE, 2015, 4

[9] XGBoost: A Scalable Tree Boosting System [J].

Chen, Tianqi ;

Guestrin, Carlos .

KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794

[10] High Prevalence of Assisted Injection Among Street-Involved Youth in a Canadian Setting [J].

Cheng, Tessa ;

Kerr, Thomas ;

Small, Will ;

Dong, Huiru ;

Montaner, Julio ;

Wood, Evan ;

DeBeck, Kora .

AIDS AND BEHAVIOR, 2016, 20 (02) :377-384

← 1 2 3 4 5 →