Machine Learning and Feature Selection for soil spectroscopy. An evaluation of Random Forest wrappers to predict soil organic matter, clay, and carbonates

被引:4
|
作者
Canero, Francisco M. [1 ]
Rodriguez-Galiano, Victor [1 ]
Aragones, David [2 ]
机构
[1] Univ Seville, Dept Phys Geog & Reg Geog Anal, Seville 41004, Spain
[2] CSIC, Remote Sensing & Geog Informat Syst Lab LAST EBD, Donana Biol Stn, Seville 41092, Spain
关键词
Random forest; Sequential flotant selection; Sequential flotant forward selection; Partial least squares regression; Wrapper methods; Sierra de las nieves; PARTIAL LEAST-SQUARES; DIFFUSE-REFLECTANCE SPECTROSCOPY; INFRARED SPECTROSCOPY; HYPERSPECTRAL IMAGES; PRINCIPAL COMPONENT; TOTAL NITROGEN; REGRESSION; AIRBORNE; CLASSIFICATION; PLSR;
D O I
10.1016/j.heliyon.2024.e30228
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Soil spectroscopy estimates soil properties using the absorption features in soil spectra. However, modelling soil properties with soil spectroscopy is challenging due to the high dimensionality of spectral data. Feature Selection wrapper methods are promising approaches to reduce the dimensionality but are barely used in soil spectroscopy. The aim of this study is to evaluate the performance of two feature selection wrapper methods, Sequential Forward Selection (SFS) and Sequential Flotant Forward Selection (SFFS) built using the Random Forest (RF) algorithm, for dimensionality reduction of spectral data and predictive modelling of modelling soil organic matter (SOM), clay and carbonates. The reflectance of 100 soil samples, acquired from Sierra de las Nieves (Spain), was measured under laboratory conditions using ASD FieldSpec Pro JR. Four different datasets were obtained after applying two spectral preprocessing methods to raw spectra: raw spectra, Continuum Removal (CR), Multiplicative Scatter Correction (MSC), and a socalled " Global " dataset composed of raw, CR and MSC features. The performance of RF models built with feature selection methods was compared to that of Partial Least Squares Regression (PLSR) and RF (alone). RF models built with SFS and SFFS outperformed PLSR and RF alone models: The best RF models with feature selection had a respective ratio of performance to interquartile distance of 1.93, 0.38 and 2.56. PLSR models had an accuracy of 1.41, 0.29 and 1.81 for SOM, carbonates, and clay, respectively. RF alone had a respective performance of 1.29, 0.29 and 1.81. The application of feature selection wrapper methods reduced the number of features to less than 1 % of the starting features. Features were selected across all spectra for SOM and clay, and around 900 nm, 1900 nm, and 2350 nm for carbonates. However, feature selection highlighted features around 1100 nm in SOM modelling, as well as other features around 2200 nm, which is considered a main absorption feature of clay. The application of feature selection with Random Forest was very important in improving modelling accuracy, reducing the redundant features and avoiding the curse of dimensionality or Hughes effect. Thus, this research showed an alternative to dimensionality reduction approaches that have been applied to date to model soil properties with spectroscopy and paves the way for further scientific investigation based on feature selection methods and machine learning.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Can machine learning models predict soil moisture evaporation rates? An investigation via novel feature selection techniques and model comparisons
    Priyanka, Priyanka
    Kumar, Praveen
    Panda, Sucheta
    Thakur, Tejinder
    Uday, K. V.
    Dutt, Varun
    FRONTIERS IN EARTH SCIENCE, 2024, 12
  • [32] Estimation of soil organic matter in the Ogan-Kuqa River Oasis, Northwest China, based on visible and near-infrared spectroscopy and machine learning
    Zhou, Qian
    Ding, Jianli
    Ge, Xiangyu
    Li, Ke
    Zhang, Zipeng
    Gu, Yongsheng
    JOURNAL OF ARID LAND, 2023, 15 (02) : 191 - 204
  • [33] Evaluation of Airborne HySpex and Spaceborne PRISMA Hyperspectral Remote Sensing Data for Soil Organic Matter and Carbonates Estimation
    Angelopoulou, Theodora
    Chabrillat, Sabine
    Pignatti, Stefano
    Milewski, Robert
    Karyotis, Konstantinos
    Brell, Maximilian
    Ruhtz, Thomas
    Bochtis, Dionysis
    Zalidis, George
    REMOTE SENSING, 2023, 15 (04)
  • [34] Predicting soil organic matter and soil moisture content from digital camera images: comparison of regression and machine learning approaches
    Taneja, Perry
    Vasava, Hiteshkumar Bhogilal
    Fathololoumi, Solmaz
    Daggupati, Prasad
    Biswas, Asim
    CANADIAN JOURNAL OF SOIL SCIENCE, 2022,
  • [35] Evaluation of Landsat 8 and Sentinel-2 vegetation indices to predict soil organic carbon using machine learning models
    Parya Abbaszad
    Farrokh Asadzadeh
    Salar Rezapour
    Kamal Khosravi Aqdam
    Farzin Shabani
    Modeling Earth Systems and Environment, 2024, 10 : 2581 - 2592
  • [36] Waveband selection for NIR spectroscopy analysis of soil organic matter based on SG smoothing and MWPLS methods
    Chen, Huazhou
    Pan, Tao
    Chen, Jiemei
    Lu, Qipeng
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2011, 107 (01) : 139 - 146
  • [37] Evaluation of Landsat 8 and Sentinel-2 vegetation indices to predict soil organic carbon using machine learning models
    Abbaszad, Parya
    Asadzadeh, Farrokh
    Rezapour, Salar
    Aqdam, Kamal Khosravi
    Shabani, Farzin
    MODELING EARTH SYSTEMS AND ENVIRONMENT, 2024, 10 (02) : 2581 - 2592
  • [38] Soil organic matter estimation by using Landsat-8 pansharpened image and machine learning
    Bouasria, Abdelkrim
    Namr, Khalid Ibno
    Ettachfini, El Mostafa
    Rahimi, Abdelmejid
    2020 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS), 2020,
  • [39] Incorporation of high accuracy surface modeling into machine learning to improve soil organic matter mapping
    Wang, Zong
    Du, Zhengping
    Li, Xiaoyan
    Bao, Zhengyi
    Zhao, Na
    Yue, Tianxiang
    ECOLOGICAL INDICATORS, 2021, 129
  • [40] Determination of organic carbon and nitrogen in particulate organic matter and particle size fractions of Brookston clay loam soil using infrared spectroscopy
    Yang, X. M.
    Xie, H. T.
    Drury, C. F.
    Reynolds, W. D.
    Yang, J. Y.
    Zhang, X. D.
    EUROPEAN JOURNAL OF SOIL SCIENCE, 2012, 63 (02) : 177 - 188