Combination of Feature Selection and CatBoost for Prediction: The First Application to the Estimation of Aboveground Biomass

被引:120
|
作者
Luo, Mi [1 ]
Wang, Yifu [1 ]
Xie, Yunhong [1 ]
Zhou, Lai [1 ]
Qiao, Jingjing [1 ]
Qiu, Siyu [1 ]
Sun, Yujun [1 ]
机构
[1] Beijing Forestry Univ, State Forestry Adm Key Lab Forest Resources & Env, Beijing 100083, Peoples R China
来源
FORESTS | 2021年 / 12卷 / 02期
基金
中国国家自然科学基金;
关键词
feature selection; machine learning algorithms; ensemble learning; CatBoost; XGBoost; forest type; FOREST BIOMASS; IMAGERY; CHINA; MODEL; CLASSIFICATION; SENTINEL-2; TEXTURE; AREA;
D O I
10.3390/f12020216
中图分类号
S7 [林业];
学科分类号
0829 ; 0907 ;
摘要
Increasing numbers of explanatory variables tend to result in information redundancy and "dimensional disaster" in the quantitative remote sensing of forest aboveground biomass (AGB). Feature selection of model factors is an effective method for improving the accuracy of AGB estimates. Machine learning algorithms are also widely used in AGB estimation, although little research has addressed the use of the categorical boosting algorithm (CatBoost) for AGB estimation. Both feature selection and regression for AGB estimation models are typically performed with the same machine learning algorithm, but there is no evidence to suggest that this is the best method. Therefore, the present study focuses on evaluating the performance of the CatBoost algorithm for AGB estimation and comparing the performance of different combinations of feature selection methods and machine learning algorithms. AGB estimation models of four forest types were developed based on Landsat OLI data using three feature selection methods (recursive feature elimination (RFE), variable selection using random forests (VSURF), and least absolute shrinkage and selection operator (LASSO)) and three machine learning algorithms (random forest regression (RFR), extreme gradient boosting (XGBoost), and categorical boosting (CatBoost)). Feature selection had a significant influence on AGB estimation. RFE preserved the most informative features for AGB estimation and was superior to VSURF and LASSO. In addition, CatBoost improved the accuracy of the AGB estimation models compared with RFR and XGBoost. AGB estimation models using RFE for feature selection and CatBoost as the regression algorithm achieved the highest accuracy, with root mean square errors (RMSEs) of 26.54 Mg/ha for coniferous forest, 24.67 Mg/ha for broad-leaved forest, 22.62 Mg/ha for mixed forests, and 25.77 Mg/ha for all forests. The combination of RFE and CatBoost had better performance than the VSURF-RFR combination in which random forests were used for both feature selection and regression, indicating that feature selection and regression performed by a single machine learning algorithm may not always ensure optimal AGB estimation. It is promising to extending the application of new machine learning algorithms and feature selection methods to improve the accuracy of AGB estimates.
引用
收藏
页码:1 / 22
页数:21
相关论文
共 50 条
  • [31] Application of Genetic Algorithm for Feature Selection in Optimisation of SVMR Model for Prediction of Yarn Tenacity
    Abakar, Khalid A. A.
    Yu, Chongwen
    FIBRES & TEXTILES IN EASTERN EUROPE, 2013, 21 (06) : 95 - 99
  • [32] A Large-Scale Inter-Comparison and Evaluation of Spatial Feature Engineering Strategies for Forest Aboveground Biomass Estimation Using Landsat Satellite Imagery
    Kilbride, John B.
    Kennedy, Robert E.
    REMOTE SENSING, 2024, 16 (23)
  • [33] Collaborative Estimation of Aboveground Forest Biomass Using P-Band and X-Band Interferometric Synthetic Aperture Radar Based on Feature Optimization
    Ma, Yunmei
    Zhao, Lei
    Chen, Erxue
    Li, Zengyuan
    Fan, Yaxiong
    Xu, Kunpeng
    Wang, Han
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 17876 - 17889
  • [34] Compressive sensing-enhanced feature selection and its application in travel mode choice prediction
    Yang, Jie
    Ma, Jun
    APPLIED SOFT COMPUTING, 2019, 75 : 537 - 547
  • [35] Advancing bankruptcy prediction: a study on an improved rime optimization algorithm and its application in feature selection
    Ji, Yaoxian
    Lu, Chenglang
    Liu, Lei
    Heidari, Ali Asghar
    Wu, Chengwen
    Chen, Huiling
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, : 3461 - 3499
  • [36] Application of hybrid forecast engine based intelligent algorithm and feature selection for wind signal prediction
    Mahdi Mir
    Mahdi Shafieezadeh
    Mohammad Amin Heidari
    Noradin Ghadimi
    Evolving Systems, 2020, 11 : 559 - 573
  • [37] THE WHEAT BIOMASS ESTIMATION BASED ON GENETIC ALGORITHM FEATURE SELECTION METHOD USING C-BAND POLSAR DATA
    Xu, Kunpeng
    Chen, Erxue
    Li, Zengyuan
    Zhao, Lei
    Zhang, Wangfei
    Wan, Xiangxing
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 7231 - 7234
  • [38] Joint functional brain network atlas estimation and feature selection for neurological disorder diagnosis with application to autism
    Mhiri, Islem
    Rekik, Islem
    MEDICAL IMAGE ANALYSIS, 2020, 60
  • [39] Developing a Dynamic Feature Selection System (DFSS) for Stock Market Prediction: Application to the Korean Industry Sectors
    Kim, Woojung
    Jeon, Jiyoung
    Jang, Minwoo
    Kim, Sanghoe
    Lee, Heesoo
    Yoo, Sanghyuk
    Ahn, Jaejoon
    APPLIED SCIENCES-BASEL, 2024, 14 (16):
  • [40] Application of Feature Selection Technology Based on Incremental of Diversity in Prediction of Flexible Regions from Protein Sequences
    Yang, Suqing
    Hu, Shisai
    Zhang, Ying
    Lv, Jun
    LETTERS IN ORGANIC CHEMISTRY, 2017, 14 (09) : 642 - 647