Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks

被引:21
作者
Westhues, Cathy C. [1 ,2 ]
Mahone, Gregory S. [3 ]
da Silva, Sofia [3 ]
Thorwarth, Patrick [3 ]
Schmidt, Malthe [3 ]
Richter, Jan-Christoph [3 ]
Simianer, Henner [2 ,4 ]
Beissinger, Timothy M. [1 ,2 ]
机构
[1] Univ Goettingen, Div Plant Breeding Methodol, Dept Crop Sci, Gottingen, Germany
[2] Univ Goettingen, Ctr Integrated Breeding Res, Gottingen, Germany
[3] Kleinwanzlebener Saatzucht KWS SAAT SE, Einbeck, Germany
[4] Univ Goettingen, Dept Anim Sci, Anim Breeding & Genet Grp, Gottingen, Germany
来源
FRONTIERS IN PLANT SCIENCE | 2021年 / 12卷
关键词
machine learning; genotype-by-environment interactions; gradient boosting; maize; yield; genomic prediction; plant breeding; REACTION NORM MODEL; GENOTYPE; YIELD; SELECTION; PEDIGREE; STRESS; GROWTH; ASSOCIATION; MINIMUM; MAXIMUM;
D O I
10.3389/fpls.2021.699589
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in genomic prediction models by describing more precisely genotype-by-environment interactions, which represent a key component of the phenotypic response for complex crop agronomic traits. Modern predictive modeling approaches can efficiently handle various data types and are able to capture complex nonlinear relationships in large datasets. In particular, machine learning techniques have gained substantial interest in recent years. Here we examined the predictive ability of machine learning-based models for two phenotypic traits in maize using data collected by the Maize Genomes to Fields (G2F) Initiative. The data we analyzed consisted of multi-environment trials (METs) dispersed across the United States and Canada from 2014 to 2017. An assortment of soil- and weather-related variables was derived and used in prediction models alongside genotypic data. Linear random effects models were compared to a linear regularized regression method (elastic net) and to two nonlinear gradient boosting methods based on decision tree algorithms (XGBoost, LightGBM). These models were evaluated under four prediction problems: (1) tested and new genotypes in a new year; (2) only unobserved genotypes in a new year; (3) tested and new genotypes in a new site; (4) only unobserved genotypes in a new site. Accuracy in forecasting grain yield performance of new genotypes in a new year was improved by up to 20% over the baseline model by including environmental predictors with gradient boosting methods. For plant height, an enhancement of predictive ability could neither be observed by using machine learning-based methods nor by using detailed environmental information. An investigation of key environmental factors using gradient boosting frameworks also revealed that temperature at flowering stage, frequency and amount of water received during the vegetative and grain filling stage, and soil organic matter content appeared as important predictors for grain yield in our panel of environments.
引用
收藏
页数:22
相关论文
共 116 条
  • [1] Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
    Abdollahi-Arpanahi, Rostam L.
    Gianola, Daniel
    Penagaricano, Francisco
    [J]. GENETICS SELECTION EVOLUTION, 2020, 52 (01)
  • [2] Maize Genomes to Fields: 2014 and 2015 field season genotype, phenotype, environment, and inbred ear image datasets
    Alkhalifah N.
    Campbell D.A.
    Falcon C.M.
    Gardiner J.M.
    Miller N.D.
    Romay M.C.
    Walls R.
    Walton R.
    Yeh C.-T.
    Bohn M.
    Bubert J.
    Buckler E.S.
    Ciampitti I.
    Flint-Garcia S.
    Gore M.A.
    Graham C.
    Hirsch C.
    Holland J.B.
    Hooker D.
    Kaeppler S.
    Knoll J.
    Lauter N.
    Lee E.C.
    Lorenz A.
    Lynch J.P.
    Moose S.P.
    Murray S.C.
    Nelson R.
    Rocheford T.
    Rodriguez O.
    Schnable J.C.
    Scully B.
    Smith M.
    Springer N.
    Thomison P.
    Tuinstra M.
    Wisser R.J.
    Xu W.
    Ertl D.
    Schnable P.S.
    De Leon N.
    Spalding E.P.
    Edwards J.
    Lawrence-Dill C.J.
    [J]. BMC Research Notes, 11 (1)
  • [3] Allen R.G.L.S., 1998, 56 FAO
  • [4] Benchmarking Parametric and Machine Learning Models for Genomic Prediction of Complex Traits
    Azodi, Christina B.
    Bolger, Emily
    McCarren, Andrew
    Roantree, Mark
    de los Campos, Gustavo
    Shiu, Shin-Han
    [J]. G3-GENES GENOMES GENETICS, 2019, 9 (11): : 3691 - 3702
  • [5] Genomic-Enabled Prediction in Maize Using Kernel Models with Genotype x Environment Interaction
    Bandeira e Sousa, Massaine
    Cuevas, Jaime
    de Oliveira Couto, Evellyn Giselly
    Perez-Rodriguez, Paulino
    Jarquin, Diego
    Fritsche-Neto, Roberto
    Burgueno, Juan
    Crossa, Jose
    [J]. G3-GENES GENOMES GENETICS, 2017, 7 (06): : 1995 - 2014
  • [6] RAPID ESTIMATION OF HEAT ACCUMULATION FROM MAXIMUM AND MINIMUM TEMPERATURES
    BASKERVILLE, GL
    EMIN, P
    [J]. ECOLOGY, 1969, 50 (03) : 514 - +
  • [7] How do various maize crop models vary in their responses to climate change factors?
    Bassu, Simona
    Brisson, Nadine
    Durand, Jean-Louis
    Boote, Kenneth
    Lizaso, Jon
    Jones, James W.
    Rosenzweig, Cynthia
    Ruane, Alex C.
    Adam, Myriam
    Baron, Christian
    Basso, Bruno
    Biernath, Christian
    Boogaard, Hendrik
    Conijn, Sjaak
    Corbeels, Marc
    Deryng, Delphine
    De Sanctis, Giacomo
    Gayler, Sebastian
    Grassini, Patricio
    Hatfield, Jerry
    Hoek, Steven
    Izaurralde, Cesar
    Jongschaap, Raymond
    Kemanian, Armen R.
    Kersebaum, K. Christian
    Kim, Soo-Hyung
    Kumar, Naresh S.
    Makowski, David
    Mueller, Christoph
    Nendel, Claas
    Priesack, Eckart
    Pravia, Maria Virginia
    Sau, Federico
    Shcherbak, Iurii
    Tao, Fulu
    Teixeira, Edmar
    Timlin, Dennis
    Waha, Katharina
    [J]. GLOBAL CHANGE BIOLOGY, 2014, 20 (07) : 2301 - 2320
  • [8] Fitting Linear Mixed-Effects Models Using lme4
    Bates, Douglas
    Maechler, Martin
    Bolker, Benjamin M.
    Walker, Steven C.
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2015, 67 (01): : 1 - 48
  • [9] Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls
    Behravan, Hamid
    Hartikainen, Jaana M.
    Tengstrom, Maria
    Pylkas, Katri
    Winqvist, Robert
    Kosma, Veli-Matti
    Mannermaa, Arto
    [J]. SCIENTIFIC REPORTS, 2018, 8
  • [10] Can Deep Learning Improve Genomic Prediction of Complex Human Traits?
    Bellot, Pau
    de los Campos, Gustavo
    Perez-Enciso, Miguel
    [J]. GENETICS, 2018, 210 (03) : 809 - 819