Combination of Feature Selection and CatBoost for Prediction: The First Application to the Estimation of Aboveground Biomass

被引:120
|
作者
Luo, Mi [1 ]
Wang, Yifu [1 ]
Xie, Yunhong [1 ]
Zhou, Lai [1 ]
Qiao, Jingjing [1 ]
Qiu, Siyu [1 ]
Sun, Yujun [1 ]
机构
[1] Beijing Forestry Univ, State Forestry Adm Key Lab Forest Resources & Env, Beijing 100083, Peoples R China
来源
FORESTS | 2021年 / 12卷 / 02期
基金
中国国家自然科学基金;
关键词
feature selection; machine learning algorithms; ensemble learning; CatBoost; XGBoost; forest type; FOREST BIOMASS; IMAGERY; CHINA; MODEL; CLASSIFICATION; SENTINEL-2; TEXTURE; AREA;
D O I
10.3390/f12020216
中图分类号
S7 [林业];
学科分类号
0829 ; 0907 ;
摘要
Increasing numbers of explanatory variables tend to result in information redundancy and "dimensional disaster" in the quantitative remote sensing of forest aboveground biomass (AGB). Feature selection of model factors is an effective method for improving the accuracy of AGB estimates. Machine learning algorithms are also widely used in AGB estimation, although little research has addressed the use of the categorical boosting algorithm (CatBoost) for AGB estimation. Both feature selection and regression for AGB estimation models are typically performed with the same machine learning algorithm, but there is no evidence to suggest that this is the best method. Therefore, the present study focuses on evaluating the performance of the CatBoost algorithm for AGB estimation and comparing the performance of different combinations of feature selection methods and machine learning algorithms. AGB estimation models of four forest types were developed based on Landsat OLI data using three feature selection methods (recursive feature elimination (RFE), variable selection using random forests (VSURF), and least absolute shrinkage and selection operator (LASSO)) and three machine learning algorithms (random forest regression (RFR), extreme gradient boosting (XGBoost), and categorical boosting (CatBoost)). Feature selection had a significant influence on AGB estimation. RFE preserved the most informative features for AGB estimation and was superior to VSURF and LASSO. In addition, CatBoost improved the accuracy of the AGB estimation models compared with RFR and XGBoost. AGB estimation models using RFE for feature selection and CatBoost as the regression algorithm achieved the highest accuracy, with root mean square errors (RMSEs) of 26.54 Mg/ha for coniferous forest, 24.67 Mg/ha for broad-leaved forest, 22.62 Mg/ha for mixed forests, and 25.77 Mg/ha for all forests. The combination of RFE and CatBoost had better performance than the VSURF-RFR combination in which random forests were used for both feature selection and regression, indicating that feature selection and regression performed by a single machine learning algorithm may not always ensure optimal AGB estimation. It is promising to extending the application of new machine learning algorithms and feature selection methods to improve the accuracy of AGB estimates.
引用
收藏
页码:1 / 22
页数:21
相关论文
共 50 条
  • [41] Probabilistic feature selection for improved asset lifetime estimation in renewables. Application to transformers in photovoltaic power plants
    Ramirez, Ibai
    Aizpurua, Jose I.
    Lasa, Iker
    del Rio, Luis
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 131
  • [42] Integration of extreme gradient boosting feature selection approach with machine learning models: application of weather relative humidity prediction
    Tao, Hai
    Awadh, Salih Muhammad
    Salih, Sinan Q.
    Shafik, Shafik S.
    Yaseen, Zaher Mundher
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (01) : 515 - 533
  • [43] A novel feature selection method based on global sensitivity analysis with application in machine learning-based prediction model
    Zhang, Pin
    APPLIED SOFT COMPUTING, 2019, 85
  • [44] An integrated scheme for feature selection and parameter setting in the support vector machine modeling and its application to the prediction of pharmacokinetic properties of drugs
    Yang, Sheng-Yong
    Huang, Qi
    Li, Lin-Li
    Ma, Chang-Ying
    Zhang, Hui
    Bai, Ru
    Teng, Qi-Zhi
    Xiang, Ming-Li
    Wei, Yu-Quan
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2009, 46 (02) : 155 - 163
  • [45] Improving medical diagnosis performance using hybrid feature selection via relieff and entropy based genetic search (RF-EGA) approach: application to breast cancer prediction
    Ilangovan Sangaiah
    A. Vincent Antony Kumar
    Cluster Computing, 2019, 22 : 6899 - 6906
  • [46] Improving medical diagnosis performance using hybrid feature selection via relieff and entropy based genetic search (RF-EGA) approach: application to breast cancer prediction
    Sangaiah, Ilangovan
    Kumar, A. Vincent Antony
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 3): : S6899 - S6906
  • [47] Application of machine learning algorithms and feature selection methods for better prediction of sludge production in a real advanced biological wastewater treatment plant
    Ekinci, Ekin
    Ozbay, Bilge
    Omurca, Sevinc Ilhan
    Sayin, Fatma Ece
    Ozbay, Ismail
    JOURNAL OF ENVIRONMENTAL MANAGEMENT, 2023, 348
  • [48] Democratic Republic of the Congo Tropical Forest Canopy Height and Aboveground Biomass Estimation with Landsat-8 Operational Land Imager (OLI) and Airborne LiDAR Data: The Effect of Seasonal Landsat Image Selection
    Kashongwe, Herve B.
    Roy, David P.
    Bwangoy, Jean Robert B.
    REMOTE SENSING, 2020, 12 (09)
  • [49] Origin-Destination Matrix Estimation and Prediction from Socioeconomic Variables Using Automatic Feature Selection Procedure-Based Machine Learning Model
    Rodriguez-Rueda, P. J.
    Ruiz-Aguilar, J. J.
    Gonzalez-Enrique, J.
    Turias, I
    JOURNAL OF URBAN PLANNING AND DEVELOPMENT, 2021, 147 (04)
  • [50] A Comprehensive Comparison of Machine Learning and Feature Selection Methods for Maize Biomass Estimation Using Sentinel-1 SAR, Sentinel-2 Vegetation Indices, and Biophysical Variables
    Xu, Chi
    Ding, Yanling
    Zheng, Xingming
    Wang, Yeqiao
    Zhang, Rui
    Zhang, Hongyan
    Dai, Zewen
    Xie, Qiaoyun
    REMOTE SENSING, 2022, 14 (16)