Machine Learning and Feature Selection for soil spectroscopy. An evaluation of Random Forest wrappers to predict soil organic matter, clay, and carbonates

被引:4
|
作者
Canero, Francisco M. [1 ]
Rodriguez-Galiano, Victor [1 ]
Aragones, David [2 ]
机构
[1] Univ Seville, Dept Phys Geog & Reg Geog Anal, Seville 41004, Spain
[2] CSIC, Remote Sensing & Geog Informat Syst Lab LAST EBD, Donana Biol Stn, Seville 41092, Spain
关键词
Random forest; Sequential flotant selection; Sequential flotant forward selection; Partial least squares regression; Wrapper methods; Sierra de las nieves; PARTIAL LEAST-SQUARES; DIFFUSE-REFLECTANCE SPECTROSCOPY; INFRARED SPECTROSCOPY; HYPERSPECTRAL IMAGES; PRINCIPAL COMPONENT; TOTAL NITROGEN; REGRESSION; AIRBORNE; CLASSIFICATION; PLSR;
D O I
10.1016/j.heliyon.2024.e30228
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Soil spectroscopy estimates soil properties using the absorption features in soil spectra. However, modelling soil properties with soil spectroscopy is challenging due to the high dimensionality of spectral data. Feature Selection wrapper methods are promising approaches to reduce the dimensionality but are barely used in soil spectroscopy. The aim of this study is to evaluate the performance of two feature selection wrapper methods, Sequential Forward Selection (SFS) and Sequential Flotant Forward Selection (SFFS) built using the Random Forest (RF) algorithm, for dimensionality reduction of spectral data and predictive modelling of modelling soil organic matter (SOM), clay and carbonates. The reflectance of 100 soil samples, acquired from Sierra de las Nieves (Spain), was measured under laboratory conditions using ASD FieldSpec Pro JR. Four different datasets were obtained after applying two spectral preprocessing methods to raw spectra: raw spectra, Continuum Removal (CR), Multiplicative Scatter Correction (MSC), and a socalled " Global " dataset composed of raw, CR and MSC features. The performance of RF models built with feature selection methods was compared to that of Partial Least Squares Regression (PLSR) and RF (alone). RF models built with SFS and SFFS outperformed PLSR and RF alone models: The best RF models with feature selection had a respective ratio of performance to interquartile distance of 1.93, 0.38 and 2.56. PLSR models had an accuracy of 1.41, 0.29 and 1.81 for SOM, carbonates, and clay, respectively. RF alone had a respective performance of 1.29, 0.29 and 1.81. The application of feature selection wrapper methods reduced the number of features to less than 1 % of the starting features. Features were selected across all spectra for SOM and clay, and around 900 nm, 1900 nm, and 2350 nm for carbonates. However, feature selection highlighted features around 1100 nm in SOM modelling, as well as other features around 2200 nm, which is considered a main absorption feature of clay. The application of feature selection with Random Forest was very important in improving modelling accuracy, reducing the redundant features and avoiding the curse of dimensionality or Hughes effect. Thus, this research showed an alternative to dimensionality reduction approaches that have been applied to date to model soil properties with spectroscopy and paves the way for further scientific investigation based on feature selection methods and machine learning.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Enhanced VNIR and MIR proximal sensing of soil organic matter and PLFA-derived soil microbial properties through machine learning ensembles and external parameter orthogonalization
    Hutengs, Christopher
    Eisenhauer, Nico
    Schaedler, Martin
    Cesarz, Simone
    Lochner, Alfred
    Seidel, Michael
    Vohland, Michael
    GEODERMA, 2024, 450
  • [42] Machine Learning Models Based on Random Forest Feature Selection and Bayesian Optimization for Predicting Daily Global Solar Radiation
    Chaibi, Mohamed
    Benghoulam, El Mahjoub
    Tarik, Lhoussaine
    Berrada, Mohamed
    El Hmaidi, Abdellah
    INTERNATIONAL JOURNAL OF RENEWABLE ENERGY DEVELOPMENT-IJRED, 2022, 11 (01): : 309 - 323
  • [43] Prediction of the spatial distribution of soil organic matter based on two-point machine learning method
    Wang Y.
    Yang K.
    Gao B.
    Feng A.
    Tian J.
    Jiang C.
    Yang J.
    Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (12): : 65 - 73
  • [44] Improving spatial prediction of soil organic matter in central Vietnam using Bayesian-enhanced machine learning and environmental covariates
    Ngu, Nguyen Huu
    Trung, Nguyen H.
    Shinjo, Hitoshi
    Chotpantarat, Srilert
    Thanh, Nguyen Ngoc
    ARCHIVES OF AGRONOMY AND SOIL SCIENCE, 2025, 71 (01) : 1 - 17
  • [45] Digital soil mapping using machine learning-based methods to predict soil organic carbon in two different districts in the Czech Republic
    Nozari, Shahin
    Pahlavan-Rad, Mohammad Reza
    Brungard, Colby
    Heung, Brandon
    Boruvka, Lubos
    SOIL AND WATER RESEARCH, 2024, 19 (01) : 32 - 49
  • [46] Development of a Soil Organic Matter Content Prediction Model Based on Supervised Learning Using Vis-NIR/SWIR Spectroscopy
    Kim, Min-Jee
    Lee, Hye-In
    Choi, Jae-Hyun
    Lim, Kyoung Jae
    Mo, Changyeun
    SENSORS, 2022, 22 (14)
  • [47] Integrating laser-induced breakdown spectroscopy and non-linear random forest-based algorithms to predict soil unconfined compressive strength
    Wudil, Yakubu Sani
    Al-Najjar, O. A.
    Al-Osta, Mohammed A.
    Al-Amoudi, Omar S. Baghabra
    Gondal, M. A.
    Kunwar, S.
    Almohammedi, Abdullah
    ENVIRONMENTAL EARTH SCIENCES, 2024, 83 (05)
  • [48] A Comprehensive Evaluation of Machine Learning Algorithms for Digital Soil Organic Carbon Mapping on a National Scale
    Radocaj, Dorijan
    Jug, Danijel
    Jug, Irena
    Jurisic, Mladen
    APPLIED SCIENCES-BASEL, 2024, 14 (21):
  • [49] Invertion of cultivated soil organic matter content combining multi-spectral remote sensing and random forest algorithm
    Liu H.
    Zhang M.
    Yang H.
    Zhang X.
    Meng X.
    Li H.
    Tang H.
    Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2020, 36 (10): : 134 - 140
  • [50] Combining Multitemporal Sentinel-2A Spectral Imaging and Random Forest to Improve the Accuracy of Soil Organic Matter Estimates in the Plough Layer for Cultivated Land
    Wang, Li
    Zhou, Yong
    AGRICULTURE-BASEL, 2023, 13 (01):