Improving Genomic Prediction with Machine Learning Incorporating TPE for Hyperparameters Optimization

被引:13
作者
Liang, Mang [1 ]
An, Bingxing [1 ]
Li, Keanning [1 ]
Du, Lili [1 ]
Deng, Tianyu [1 ]
Cao, Sheng [1 ]
Du, Yueying [1 ]
Xu, Lingyang [1 ]
Gao, Xue [1 ]
Zhang, Lupei [1 ]
Li, Junya [1 ]
Gao, Huijiang [1 ]
机构
[1] Chinese Acad Agr Sci, Inst Anim Sci, Beijing 100193, Peoples R China
来源
BIOLOGY-BASEL | 2022年 / 11卷 / 11期
关键词
hyperparameters optimization; tree-structured Parzen estimator; genomic prediction; machine learning; SELECTION; ACCURACY; WHEAT; TOOL;
D O I
10.3390/biology11111647
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Simple Summary Machine learning has been a crucial implement for genomic prediction. However, the complicated process of tuning hyperparameters tremendously hindered its application in actual breeding programs, especially for people without experience tuning hyperparameters. In this study, we applied a tree-structured Parzen estimator (TPE) to tune the hyperparameters of machine learning methods. Overall, incorporating kernel ridge regression (KRR) with TPE achieved the highest prediction accuracy in simulation and real datasets. Depending on excellent prediction ability, machine learning has been considered the most powerful implement to analyze high-throughput sequencing genome data. However, the sophisticated process of tuning hyperparameters tremendously impedes the wider application of machine learning in animal and plant breeding programs. Therefore, we integrated an automatic tuning hyperparameters algorithm, tree-structured Parzen estimator (TPE), with machine learning to simplify the process of using machine learning for genomic prediction. In this study, we applied TPE to optimize the hyperparameters of Kernel ridge regression (KRR) and support vector regression (SVR). To evaluate the performance of TPE, we compared the prediction accuracy of KRR-TPE and SVR-TPE with the genomic best linear unbiased prediction (GBLUP) and KRR-RS, KRR-Grid, SVR-RS, and SVR-Grid, which tuned the hyperparameters of KRR and SVR by using random search (RS) and grid search (Gird) in a simulation dataset and the real datasets. The results indicated that KRR-TPE achieved the most powerful prediction ability considering all populations and was the most convenient. Especially for the Chinese Simmental beef cattle and Loblolly pine populations, the prediction accuracy of KRR-TPE had an 8.73% and 6.08% average improvement compared with GBLUP, respectively. Our study will greatly promote the application of machine learning in GP and further accelerate breeding progress.
引用
收藏
页数:13
相关论文
共 45 条
  • [1] Ansari M.F., 2021, P ICIPCN INT C IMAGE
  • [2] Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes
    Austin, Peter C.
    Tu, Jack V.
    Ho, Jennifer E.
    Levy, Daniel
    Lee, Douglas S.
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2013, 66 (04) : 398 - 407
  • [3] Genome-wide prediction for complex traits under the presence of dominance effects in simulated populations using GBLUP and machine learning methods
    Carvalho Alves, Anderson Antonio
    da Costa, Rebeka Magalhaes
    Bresolin, Tiago
    Fernandes Junior, Gerardo Alves
    Espigolan, Rafael
    Frossard Ribeiro, Andre Mauric
    Carvalheiro, Roberto
    de Albuquerque, Lucia Galva
    [J]. JOURNAL OF ANIMAL SCIENCE, 2020, 98 (06)
  • [4] A Common Dataset for Genomic Analysis of Livestock Populations
    Cleveland, Matthew A.
    Hickey, John M.
    Forni, Selma
    [J]. G3-GENES GENOMES GENETICS, 2012, 2 (04): : 429 - 435
  • [5] Patterns of Population Structure and Environmental Associations to Aridity Across the Range of Loblolly Pine (Pinus taeda L., Pinaceae)
    Eckert, Andrew J.
    van Heerwaarden, Joost
    Wegrzyn, Jill L.
    Nelson, C. Dana
    Ross-Ibarra, Jeffrey
    Gonzalez-Martinez, Santiago C.
    Neale, David. B.
    [J]. GENETICS, 2010, 185 (03) : 969 - 982
  • [6] Ferdinand Erwianda Maximillian Sheldy, 2019, 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), P309, DOI 10.1109/ISRITI48646.2019.9034618
  • [7] Ghafouri F., 2020, Prof. J. Domest, V20, P19
  • [8] Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat
    Gianola, Daniel
    Okut, Hayrettin
    Weigel, Kent A.
    Rosa, Guilherme J. M.
    [J]. BMC GENETICS, 2011, 12
  • [9] Accuracy of Genomic Selection in a Rice Synthetic Population Developed for Recurrent Selection Breeding
    Grenier, Cecile
    Cao, Tuong-Vi
    Ospina, Yolima
    Quintero, Constanza
    Chatel, Marc Henri
    Tohme, Joe
    Courtois, Brigitte
    Ahmadi, Nourollah
    [J]. PLOS ONE, 2015, 10 (08):
  • [10] The impact of genetic relationship information on genome-assisted breeding values
    Habier, D.
    Fernando, R. L.
    Dekkers, J. C. M.
    [J]. GENETICS, 2007, 177 (04) : 2389 - 2397