Evaluating the extrapolation potential of random forest digital soil mapping

被引:9
|
作者
Hateffard, Fatemeh [1 ]
Steinbuch, Luc [2 ]
Heuvelink, Gerard B. M. [2 ,3 ]
机构
[1] Univ Debrecen, Dept Landscape Protect & Environm Geog, Egyet Ter 1, H-4032 Debrecen, Hungary
[2] Wageningen Univ & Res, Soil Geog & Landscape Grp, Wageningen, Netherlands
[3] ISRIC World Soil Informat, Wageningen, Netherlands
关键词
Spatial soil information; Extrapolation effects; Prediction accuracy; Similarities; REGRESSION; UNCERTAINTY; INFORMATION; PREDICTION; SUPPORT;
D O I
10.1016/j.geoderma.2023.116740
中图分类号
S15 [土壤学];
学科分类号
0903 ; 090301 ;
摘要
Spatial soil information is essential for informed decision-making in a wide range of fields. Digital soil mapping (DSM) using machine learning algorithms has become a popular approach for generating soil maps. DSM capitalises on the relation between environmental variables (i.e., features) and a soil property of interest. It typically needs a training dataset that covers the feature space well. Mapping in areas where there are no training data is challenging, because extrapolation in geographic space often induces extrapolation in feature space and can seriously deteriorate prediction accuracy. The objective of this study was to analyse the extrapolation effects of random forest DSM models by predicting topsoil properties (OC, clay, and pH) in four African countries using soil data from the ISRIC Africa Soil Profiles database. The study was conducted in eight experiments whereby soil data from one or three countries were used to predict in the other countries. We calculated similarities between donor and recipient areas using four measures, including soil type similarity, homosoil, dissimilarity index by area of applicability (AOA), and quantile regression forest (QRF) prediction interval width. The aim was to determine the level of agreement between these four measures and identify the method that had the strongest agreement with common validation metrics. The results indicated a positive correlation between soil type similarity, homosoil and dissimilarity index by AOA. Surprisingly, we observed a negative correlation between dissimilarity index by AOA and QRF prediction interval width. Although the cross-validation results for the trained models were acceptable, the extrapolation results were unsatisfactory, highlighting the risk of extrapolation. Using soil data from three countries instead of one increased the similarities for all measures, but it had a limited effect on improving extrapolation. Also, none of the measures had a strong correlation with the validation metrics. This was particularly disappointing for AOA and QRF, which we had expected to be strong indicators of extrapolation prediction performance. Results showed that homosoil and soil type methods had the strongest correlation with validation metrics. The results for this case study revealed limitations of using AOA and QRF as measures of extrapolation effects, highlighting the importance of not relying on these methods blindly. Further research and more case studies are needed to address the effects of extrapolation of DSM models.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Multivariate random forest for digital soil mapping
    van der Westhuizen, Stephan
    Heuvelink, Gerard B. M.
    Hofmeyr, David P.
    GEODERMA, 2023, 431
  • [2] Digital mapping of soil texture classes using Random Forest classification algorithm
    Dharumarajan, Subramanian
    Hegde, Rajendra
    SOIL USE AND MANAGEMENT, 2022, 38 (01) : 135 - 149
  • [3] Comparing the prediction performance, uncertainty quantification and extrapolation potential of regression kriging and random forest while accounting for soil measurement errors
    Takoutsing, Bertin
    Heuvelink, Gerard B. M.
    GEODERMA, 2022, 428
  • [4] Efficiency Comparison of Conventional and Digital Soil Mapping for Updating Soil Maps
    Kempen, Bas
    Brus, Dick J.
    Stoorvogel, Jetse J.
    Heuvelink, Gerard B. M.
    de Vries, Folkert
    SOIL SCIENCE SOCIETY OF AMERICA JOURNAL, 2012, 76 (06) : 2097 - 2115
  • [5] Accounting for analytical and proximal soil sensing errors in digital soil mapping
    Takoutsing, Bertin
    Heuvelink, Gerard B. M.
    Stoorvogel, Jetse J.
    Shepherd, Keith D.
    Aynekulu, Ermias
    EUROPEAN JOURNAL OF SOIL SCIENCE, 2022, 73 (02)
  • [6] Probability mapping of soil thickness by random survival forest at a national scale
    Chen, Songchao
    Mulder, Vera Leatitia
    Martin, Manuel P.
    Walter, Christian
    Lacost, Marine
    Richer-de-Forges, Anne C.
    Saby, Nicolas P. A.
    Loiseau, Thomas
    Hu, Bifeng
    Arrouays, Dominique
    GEODERMA, 2019, 344 : 184 - 194
  • [7] The effect of covariates on Soil Organic Matter and pH variability: a digital soil mapping approach using random forest model
    Bouslihim, Yassine
    John, Kingsley
    Miftah, Abdelhalim
    Azmi, Rida
    Aboutayeb, Rachid
    Bouasria, Abdelkrim
    Razouk, Rachid
    Hssaini, Lahcen
    ANNALS OF GIS, 2024, 30 (02) : 215 - 232
  • [8] Spatial Scaling for Digital Soil Mapping
    Malone, Brendan P.
    McBratney, Alex B.
    Minasny, Budiman
    SOIL SCIENCE SOCIETY OF AMERICA JOURNAL, 2013, 77 (03) : 890 - 902
  • [9] Digital mapping of sand, clay, and soil carbon by Random Forest models under different spatial resolutions
    Bhering, Silvio Barge
    Chagas, Cesar da Silva
    de Carvalho Junior, Waldir
    Pereira, Nilson Rendeiro
    Calderano Filho, Braz
    Koenow Pinheiro, Helena Saraiva
    PESQUISA AGROPECUARIA BRASILEIRA, 2016, 51 (09) : 1359 - 1370
  • [10] Assessing Soil Prediction Distributions for Forest Management Using Digital Soil Mapping
    Gavilan-Acuna, Gonzalo
    Coops, Nicholas C.
    Olmedo, Guillermo F.
    Tompalski, Piotr
    Roeser, Dominik
    Varhola, Andres
    SOIL SYSTEMS, 2024, 8 (02)