Machine Learning and Feature Selection for soil spectroscopy. An evaluation of Random Forest wrappers to predict soil organic matter, clay, and carbonates

被引:4
|
作者
Canero, Francisco M. [1 ]
Rodriguez-Galiano, Victor [1 ]
Aragones, David [2 ]
机构
[1] Univ Seville, Dept Phys Geog & Reg Geog Anal, Seville 41004, Spain
[2] CSIC, Remote Sensing & Geog Informat Syst Lab LAST EBD, Donana Biol Stn, Seville 41092, Spain
关键词
Random forest; Sequential flotant selection; Sequential flotant forward selection; Partial least squares regression; Wrapper methods; Sierra de las nieves; PARTIAL LEAST-SQUARES; DIFFUSE-REFLECTANCE SPECTROSCOPY; INFRARED SPECTROSCOPY; HYPERSPECTRAL IMAGES; PRINCIPAL COMPONENT; TOTAL NITROGEN; REGRESSION; AIRBORNE; CLASSIFICATION; PLSR;
D O I
10.1016/j.heliyon.2024.e30228
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Soil spectroscopy estimates soil properties using the absorption features in soil spectra. However, modelling soil properties with soil spectroscopy is challenging due to the high dimensionality of spectral data. Feature Selection wrapper methods are promising approaches to reduce the dimensionality but are barely used in soil spectroscopy. The aim of this study is to evaluate the performance of two feature selection wrapper methods, Sequential Forward Selection (SFS) and Sequential Flotant Forward Selection (SFFS) built using the Random Forest (RF) algorithm, for dimensionality reduction of spectral data and predictive modelling of modelling soil organic matter (SOM), clay and carbonates. The reflectance of 100 soil samples, acquired from Sierra de las Nieves (Spain), was measured under laboratory conditions using ASD FieldSpec Pro JR. Four different datasets were obtained after applying two spectral preprocessing methods to raw spectra: raw spectra, Continuum Removal (CR), Multiplicative Scatter Correction (MSC), and a socalled " Global " dataset composed of raw, CR and MSC features. The performance of RF models built with feature selection methods was compared to that of Partial Least Squares Regression (PLSR) and RF (alone). RF models built with SFS and SFFS outperformed PLSR and RF alone models: The best RF models with feature selection had a respective ratio of performance to interquartile distance of 1.93, 0.38 and 2.56. PLSR models had an accuracy of 1.41, 0.29 and 1.81 for SOM, carbonates, and clay, respectively. RF alone had a respective performance of 1.29, 0.29 and 1.81. The application of feature selection wrapper methods reduced the number of features to less than 1 % of the starting features. Features were selected across all spectra for SOM and clay, and around 900 nm, 1900 nm, and 2350 nm for carbonates. However, feature selection highlighted features around 1100 nm in SOM modelling, as well as other features around 2200 nm, which is considered a main absorption feature of clay. The application of feature selection with Random Forest was very important in improving modelling accuracy, reducing the redundant features and avoiding the curse of dimensionality or Hughes effect. Thus, this research showed an alternative to dimensionality reduction approaches that have been applied to date to model soil properties with spectroscopy and paves the way for further scientific investigation based on feature selection methods and machine learning.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Digital Mapping of Soil Organic Matter in Northern Iraq: Machine Learning Approach
    Khalaf, Halmat S.
    Mustafa, Yaseen T.
    Fayyadh, Mohammed A.
    APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [22] Green methodology for soil organic matter analysis using a national near infrared spectral library in tandem with learning machine
    de Santana, Felipe B.
    de Souza, Andre M.
    Poppi, Ronei J.
    SCIENCE OF THE TOTAL ENVIRONMENT, 2019, 658 : 895 - 900
  • [23] Digital Mapping of Soil pH Based on Machine Learning Combined with Feature Selection Methods in East China
    Zhao, Zhi-Dong
    Zhao, Ming-Song
    Lu, Hong-Liang
    Wang, Shi-Hang
    Lu, Yuan-Yuan
    SUSTAINABILITY, 2023, 15 (17)
  • [24] Predicting and Mapping of Soil Organic Matter with Machine Learning in the Black Soil Region of the Southern Northeast Plain of China
    Li, Yiyang
    Yao, Gang
    Li, Shuangyi
    Dong, Xiuru
    AGRONOMY-BASEL, 2025, 15 (03):
  • [25] Spatial prediction of soil water retention in a Paramo landscape: Methodological insight into machine learning using random forest
    Blanco, Carlos M. Guio
    Gomez, Victor M. Brito
    Crespo, Patricio
    Liess, Mareike
    GEODERMA, 2018, 316 : 100 - 114
  • [26] A comparative study between a new method and other machine learning algorithms for soil organic carbon and total nitrogen prediction using near infrared spectroscopy
    Red, Rabie
    Saffaj, Taoufiq
    Ilham, Bouzida
    Saidi, Ouadi
    Issam, Kadmiri
    Brahim, Lakssir
    El Hadrami, El Mestafa
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2019, 195
  • [27] Machine-learning-based quantitative estimation of soil organic carbon content by VIS/NIR spectroscopy
    Ding, Jianli
    Yang, Aixia
    Wang, Jingzhe
    Sagan, Vasit
    Yu, Danlin
    PEERJ, 2018, 6
  • [28] Using Machine Learning Algorithms Based on GF-6 and Google Earth Engine to Predict and Map the Spatial Distribution of Soil Organic Matter Content
    Ye, Zhishan
    Sheng, Ziheng
    Liu, Xiaoyan
    Ma, Youhua
    Wang, Ruochen
    Ding, Shiwei
    Liu, Mengqian
    Li, Zijie
    Wang, Qiang
    SUSTAINABILITY, 2021, 13 (24)
  • [29] Integration of Vis-NIR Spectroscopy and Machine Learning Techniques to Predict Eight Soil Parameters in Alpine Regions
    Jiang, Chuanli
    Zhao, Jianyun
    Li, Guorong
    AGRONOMY-BASEL, 2023, 13 (11):
  • [30] Mapping dynamics of soil organic matter in croplands with MODIS data and machine learning algorithms
    Chen, Di
    Chang, Naijie
    Xiao, Jingfeng
    Zhou, Qingbo
    Wu, Wenbin
    SCIENCE OF THE TOTAL ENVIRONMENT, 2019, 669 : 844 - 855