Evaluation of Three Feature Dimension Reduction Techniques for Machine Learning-Based Crop Yield Prediction Models

被引:14
作者
Pham, Hoa Thi [1 ,2 ]
Awange, Joseph [1 ,3 ]
Kuhn, Michael [1 ]
机构
[1] Curtin Univ, Sch Earth & Planetary Sci, Spatial Sci Discipline, Perth 6102, Australia
[2] Hanoi Univ Nat Resources & Environm, Fac Surveying Mapping & Geog Informat, Hanoi 100000, Vietnam
[3] Karlsruhe Inst Technol, Geodet Inst, Engler Str 7, D-76131 Karlsruhe, Germany
关键词
feature selection; feature extraction; machine learning; crop yield; VCI; TCI; VEGETATION HEALTH INDEXES; FEATURE-SELECTION; NEURAL-NETWORKS; DROUGHT;
D O I
10.3390/s22176609
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Machine learning (ML) has been widely used worldwide to develop crop yield forecasting models. However, it is still challenging to identify the most critical features from a dataset. Although either feature selection (FS) or feature extraction (FX) techniques have been employed, no research compares their performances and, more importantly, the benefits of combining both methods. Therefore, this paper proposes a framework that uses non-feature reduction (All-F) as a baseline to investigate the performance of FS, FX, and a combination of both (FSX). The case study employs the vegetation condition index (VCI)/temperature condition index (TCI) to develop 21 rice yield forecasting models for eight sub-regions in Vietnam based on ML methods, namely linear, support vector machine (SVM), decision tree (Tree), artificial neural network (ANN), and Ensemble. The results reveal that FSX takes full advantage of the FS and FX, leading FSX-based models to perform the best in 18 out of 21 models, while 2 (1) for FS-based (FX-based) models. These FXS-, FS-, and FX-based models improve All-F-based models at an average level of 21% and up to 60% in terms of RMSE. Furthermore, 21 of the best models are developed based on Ensemble (13 models), Tree (6 models), linear (1 model), and ANN (1 model). These findings highlight the significant role of FS, FX, and specially FSX coupled with a wide range of ML algorithms (especially Ensemble) for enhancing the accuracy of predicting crop yield.
引用
收藏
页数:18
相关论文
共 76 条
[1]  
[Anonymous], 2002, RICE ALMANAC SOURCE
[2]  
Awange J., 2020, HYBRID IMAGING VISUA, P9
[3]   UAV-based coffee yield prediction utilizing feature selection and deep learning [J].
Barbosa, Brenon Diennevan Souza ;
Ferraz, Gabriel Araujo e Silva ;
Costa, Lucas ;
Ampatzidis, Yiannis ;
Vijayakumar, Vinay ;
Santos, Luana Mendes dos .
SMART AGRICULTURAL TECHNOLOGY, 2021, 1
[4]   The effect of tuning, feature engineering, and feature selection in data mining applied to rainfed sugarcane yield modelling [J].
Bocca, Felipe F. ;
Antunes Rodrigues, Luiz Henrique .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2016, 128 :67-76
[5]  
Cardoso JF, 2004, J MACH LEARN RES, V4, P1177, DOI 10.1162/jmlr.2003.4.7-8.1177
[6]  
Cateni S., 2012, Multivariate Analysis in Management, Engineering and the Science, P103, DOI DOI 10.5772/53862
[7]   A surrogate model based on feature selection techniques and regression learners to improve soybean yield prediction in southern France [J].
Corrales, David Camilo ;
Schoving, Celine ;
Raynal, Helene ;
Debaeke, Philippe ;
Journet, Etienne-Pascal ;
Constantin, Julie .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2022, 192
[8]   Using recursive feature elimination in random forest to account for correlated variables in high dimensional data [J].
Darst, Burcu F. ;
Malecki, Kristen C. ;
Engelman, Corinne D. .
BMC GENETICS, 2018, 19
[9]  
Dash M., 1997, Intelligent Data Analysis, V1
[10]   Gene selection with guided regularized random forest [J].
Deng, Houtao ;
Runger, George .
PATTERN RECOGNITION, 2013, 46 (12) :3483-3489