DeepCGP: A Deep Learning Method to Compress Genome-Wide Polymorphisms for Predicting Phenotype of Rice

被引：5

作者：

Islam, Tanzila ^{[1
]}

Kim, Chyon Hae ^{[1
]}

Iwata, Hiroyoshi ^{[2
]}

Hiroyuki, Shimono ^{[3
,4
]}

Kimura, Akio ^{[1
,4
]}

机构：

[1] Iwate Univ, Grad Sch Sci & Engn, Dept Syst Innovat Engn, Morioka, Iwate 0208550, Japan

[2] Univ Tokyo, Dept Agr & Environm Biol, Bunkyo Ku, Tokyo 1130033, Japan

[3] Iwate Univ, Fac Agr, Crop Sci Lab, Morioka, Iwate 0208550, Japan

[4] Iwate Univ, Agri Innovat Ctr, Morioka, Iwate 0208550, Japan

来源：

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS | 2023年 / 20卷 / 03期

基金：

日本学术振兴会;

关键词：

Bioinformatics; Genomics; Data models; Predictive models; Mathematical models; Deep learning; Radio frequency; autoencoder; genomic selection; data compression; genomic prediction; BREEDING TECHNOLOGIES; FOOD SECURITY; REGRESSION; SELECTION;

D O I：

10.1109/TCBB.2022.3231466

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Genomic selection (GS) is expected to accelerate plant and animal breeding. During the last decade, genome-wide polymorphism data have increased, which has raised concerns about storage cost and computational time. Several individual studies have attempted to compress the genome data and predict phenotypes. However, compression models lack adequate quality of data after compression, and prediction models are time consuming and use original data to predict the phenotype. Therefore, a combined application of compression and genomic prediction modeling using deep learning could resolve these limitations. A Deep Learning Compression-based Genomic Prediction (DeepCGP) model that can compress genome-wide polymorphism data and predict phenotypes of a target trait from compressed information was proposed. The DeepCGP model contained two parts: (i) an autoencoder model based on deep neural networks to compress genome-wide polymorphism data, and (ii) regression models based on random forests (RF), genomic best linear unbiased prediction (GBLUP), and Bayesian variable selection (BayesB) to predict phenotypes from compressed information. Two datasets with genome-wide marker genotypes and target trait phenotypes in rice were applied. The DeepCGP model obtained up to 99% prediction accuracy to the maximum for a trait after 98% compression. BayesB required extensive computational time among the three methods, and showed the highest accuracy; however, BayesB could only be used with compressed data. Overall, DeepCGP outperformed state-of-the-art methods in terms of both compression and prediction. Our code and data are available at https://github.com/tanzilamohita/DeepCGP.

引用

页码：2078 / 2088

页数：11

共 50 条

[1] Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
Abdollahi-Arpanahi, Rostam L.
Gianola, Daniel
Penagaricano, Francisco
[J]. GENETICS SELECTION EVOLUTION, 2020, 52 (01)
[2] A Fast Reference-Free Genome Compression Using Deep Neural Networks
Absardi, Zeinab Nazemi
Javidan, Reza
[J]. 2019 BIG DATA, KNOWLEDGE AND CONTROL SYSTEMS ENGINEERING (BDKCSE), 2019,
[3] Bhukya Raju, 2020, Information and Communication Technology for Sustainable Development. Proceedings of ICT4SD 2018. Advances in Intelligent Systems and Computing (AISC 933), P615, DOI 10.1007/978-981-13-7166-0_61
[4] A Ranking Approach to Genomic Selection
Blondel, Mathieu
Onogi, Akio
Iwata, Hiroyoshi
Ueda, Naonori
[J]. PLOS ONE, 2015, 10 (06):
[5] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[6] Random forests for genomic data analysis
Chen, Xi
Ishwaran, Hemant
[J]. GENOMICS, 2012, 99 (06) : 323 - 329
[7] Challenges of Big Data analysis
Fan, Jianqing
Han, Fang
Liu, Han
[J]. NATIONAL SCIENCE REVIEW, 2014, 1 (02) : 293 - 314
[8] Priors in Whole-Genome Regression: The Bayesian Alphabet Returns
Gianola, Daniel
[J]. GENETICS, 2013, 194 (03) : 573 - 596
[9] Additive Genetic Variability and the Bayesian Alphabet
Gianola, Daniel
de los Campos, Gustavo
Hill, William G.
Manfredi, Eduardo
Fernando, Rohan
[J]. GENETICS, 2009, 183 (01) : 347 - 363
[10] Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits
Gonzalez-Recio, Oscar
Rosa, Guilherme J. M.
Gianola, Daniel
[J]. LIVESTOCK SCIENCE, 2014, 166 : 217 - 231

← 1 2 3 4 5 →