Ensemble learning prediction of soybean yields in China based on meteorological data

被引:17
作者
Li, Qian-chuan [1 ]
Xu, Shi-wei [1 ,2 ,5 ]
Zhuang, Jia-yu [1 ,5 ]
Liu, Jia-Jia [2 ]
Zhou, Yi [3 ]
Zhang, Ze-xi [4 ]
机构
[1] Chinese Acad Agr Sci, Agr Informat Inst, Beijing 100081, Peoples R China
[2] Beijing Engn Res Ctr Agr Monitoring & Early Warnin, Beijing 100081, Peoples R China
[3] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
[4] Columbia Univ, Dept Math, New York, NY 10027 USA
[5] Minist Agr & Rural Affairs, Key Lab Agr Monitoring & Early Warning Technol, Beijing 100081, Peoples R China
关键词
meteorological factors; ensemble learning; crop yield prediction; machine learning; county-level; CLIMATE DATA; WHEAT YIELD; CROP YIELD; TRENDS; TEMPERATURE; PATTERNS; RAINFALL;
D O I
10.1016/j.jia.2023.02.011
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning. Although previous studies have used machine learning algorithms to predict soybean yield based on meteorological data, it is not clear how different models can be used to effectively separate soybean meteorological yield from soybean yield in various regions. In addition, comprehensively integrating the advantages of various machine learning algorithms to improve the prediction accuracy through ensemble learning algorithms has not been studied in depth. This study used and analyzed various daily meteorological data and soybean yield data from 173 county-level administrative regions and meteorological stations in two principal soybean planting areas in China (Northeast China and the Huang-Huai region), covering 34 years. Three effective machine learning algorithms (K-nearest neighbor, random forest, and support vector regression) were adopted as the base-models to establish a high-precision and highly-reliable soybean meteorological yield prediction model based on the stacking ensemble learning framework. The model's generalizability was further improved through 5-fold crossvalidation, and the model was optimized by principal component analysis and hyperparametric optimization. The accuracy of the model was evaluated by using the five-year sliding prediction and four regression indicators of the 173 counties, which showed that the stacking model has higher accuracy and stronger robustness. The 5-year sliding estimations of soybean yield based on the stacking model in 173 counties showed that the prediction effect can reflect the spatiotemporal distribution of soybean yield in detail, and the mean absolute percentage error (MAPE) was less than 5%. The stacking prediction model of soybean meteorological yield provides a new approach for accurately predicting soybean yield.
引用
收藏
页码:1909 / 1927
页数:19
相关论文
共 86 条
  • [1] Determination of optimized cropping patterns according to crop yield response under baseline condition and climate-change condition
    Abdi-Dehkordi, Mehri
    Bozorg-Haddad, Omid
    Chu, Xuefeng
    [J]. IRRIGATION AND DRAINAGE, 2018, 67 (05) : 654 - 669
  • [2] Comparison between SARIMA and Holt-Winters models for forecasting monthly streamflow in the western region of Cuba
    Alonso Brito, Gustavo Reinel
    Rivero Villaverde, Anaily
    Lau Quan, Andres
    Ruiz Perez, Maria Elena
    [J]. SN APPLIED SCIENCES, 2021, 3 (06):
  • [3] Forecast of daily PM2.5 concentrations applying artificial neural networks and Holt-Winters models
    Baptista Ventura, Luciana Maria
    Pinto, Fellipe de Oliveira
    Soares, Laiza Molezon
    Luna, Aderval S.
    Gioda, Adriana
    [J]. AIR QUALITY ATMOSPHERE AND HEALTH, 2019, 12 (03) : 317 - 325
  • [4] Spatial interpolation of climate variables in Northern Germany-Influence of temporal resolution and network density
    Berndt, C.
    Haberlandt, U.
    [J]. JOURNAL OF HYDROLOGY-REGIONAL STUDIES, 2018, 15 : 184 - 202
  • [5] Representativeness impacts on accuracy and precision of climate spatial interpolation in data-scarce regions
    Bhowmik, Avit Kumar
    Costa, Ana Cristina
    [J]. METEOROLOGICAL APPLICATIONS, 2015, 22 (03) : 368 - 377
  • [6] Bongaarts J, 2019, POPUL DEV REV, V45, P936, DOI 10.1111/padr.12305
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches
    Cai, Yaping
    Guan, Kaiyu
    Lobell, David
    Potgieter, Andries B.
    Wang, Shaowen
    Peng, Jian
    Xu, Tianfang
    Asseng, Senthold
    Zhang, Yongguang
    You, Liangzhi
    Peng, Bin
    [J]. AGRICULTURAL AND FOREST METEOROLOGY, 2019, 274 : 144 - 159
  • [9] A surrogate model based on feature selection techniques and regression learners to improve soybean yield prediction in southern France
    Corrales, David Camilo
    Schoving, Celine
    Raynal, Helene
    Debaeke, Philippe
    Journet, Etienne-Pascal
    Constantin, Julie
    [J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2022, 192
  • [10] SUPPORT-VECTOR NETWORKS
    CORTES, C
    VAPNIK, V
    [J]. MACHINE LEARNING, 1995, 20 (03) : 273 - 297