Evaluating validation strategies on the performance of soil property prediction from regional to continental spectral data

被引:51
作者
Chen, Songchao [1 ,2 ]
Xu, Hanyi [1 ]
Xu, Dongyun [1 ]
Ji, Wenjun [3 ]
Li, Shuo [4 ]
Yang, Meihua [5 ]
Hu, Bifeng [6 ]
Zhou, Yin [1 ,7 ]
Wang, Nan [1 ]
Arrouays, Dominique [2 ]
Shi, Zhou [1 ,8 ]
机构
[1] Zhejiang Univ, Coll Environm & Resource Sci, Inst Appl Remote Sensing & Informat Technol, Hangzhou 310058, Peoples R China
[2] INRAE, Unite InfoSol, F-45075 Orleans, France
[3] China Agr Univ, Coll Land Sci & Technol, Beijing 100085, Peoples R China
[4] Cent China Normal Univ, Key Lab Geog Proc Anal & Simulat, Wuhan 430079, Peoples R China
[5] Yuzhang Normal Univ, Dept Environm Engn, Nanchang 330103, Jiangxi, Peoples R China
[6] Jiangxi Univ Finance & Econ, Sch Tourism & Urban Management, Dept Land Resource Management, Nanchang 330013, Jiangxi, Peoples R China
[7] Zhejiang Univ, Sch Publ Affairs, Inst Land Sci & Property Management, Hangzhou 310058, Peoples R China
[8] Minist Agr & Rural Affairs, Key Lab Spect Sensing, Hangzhou 310058, Peoples R China
基金
中国国家自然科学基金;
关键词
Proximal soil sensing; Vis-NIR spectra; Model robustness; Soil organic carbon; Clay; Calibration sampling; NEAR-INFRARED SPECTROSCOPY; ORGANIC-CARBON CONTENT; NIR SPECTROSCOPY; MIDINFRARED SPECTROSCOPY; REFLECTANCE SPECTRA; SAMPLE SELECTION; CALIBRATION SET; NEURAL-NETWORK; MODEL; MOISTURE;
D O I
10.1016/j.geoderma.2021.115159
中图分类号
S15 [土壤学];
学科分类号
0903 ; 090301 ;
摘要
Visible-near infrared (vis-NIR) spectroscopy has been widely used to characterize soil information from field to global scales. Before applying a calibrated spectral predictive model to acquire soil information, either independent validation or k-fold cross validation is used to evaluate model performance. However, there is no consensus on which validation strategy is more suitable and robust when evaluating model performance for the studies in different scales. The objective of this study is to evaluate and compare the model performance of two validation strategies coupling different calibration sizes (a ratio of calibration to validation of 2:1, 4:1 and 9:1) and calibration sampling strategies (random sampling (RS), rank, Kennard-Stone (KS), rank-Kennard-Stone (RKS) and conditioned Latin hypercube sampling (cLHS)) across scales. A total of 17,272 vis-NIR spectra of mineral soils from LUCAS data (continental scale) and their soil organic carbon (SOC) and clay contents were used in this study, and the dataset was further split into national (2761 samples in France) and five regional datasets (110 to 248 samples from five French administrative regions). To eliminate the effect of changing validation set on the model performance, a consistent test set (20% of total samples at each scale) was split to evaluate all the combinations involved in two validation strategies. The Lin's concordance correlation coefficient (CCC) of the cubist model were stable for both SOC and clay for different calibration sizes, calibration sampling and validation strategies for a large calibration size (>1400) at the national and continental scales. A larger calibration size can potentially improve model performance for a small dataset (<300) at the regional scale, and a wider calibration range would result in better model performance. No silver bullet was found among the different calibration sampling strategies at the regional scale. For five French regions (small data set), we found a high variation (95th percentile minus the 5th percentile) in the CCC among the models built from 50 repeated RS (0.10-0.44 for SOC, 0.16-0.52 for clay) and cLHS (0.08-0.40 for SOC, 0.12-0.36 for clay). This finding indicates that a one-time RS or cLHS for selecting the calibration set has high uncertainty in model evaluation for a small dataset and therefore should be used with caution. Therefore, we suggest the following: (1) for a large data set (thousands), either one-time random sampling for independent validation or k-fold cross validation would be appropriate; (2) for a small data set (dozens to hundreds), k-fold cross validation and/or repeated random sampling for independent validation would be more robust for spectral predictive model evaluation.
引用
收藏
页数:10
相关论文
共 70 条
[31]   A COMPARISON OF THREE METHODS FOR SELECTING VALUES OF INPUT VARIABLES IN THE ANALYSIS OF OUTPUT FROM A COMPUTER CODE [J].
MCKAY, MD ;
BECKMAN, RJ ;
CONOVER, WJ .
TECHNOMETRICS, 1979, 21 (02) :239-245
[32]   A high resolution map of French soil organic carbon [J].
Meersmans, Jeroen ;
Martin, Manuel Pascal ;
Lacarce, Eva ;
De Baets, Sarah ;
Jolivet, Claudy ;
Boulonne, Line ;
Lehmann, Sebastien ;
Saby, Nicolas Philippe Anthony ;
Bispo, Antonio ;
Arrouays, Dominique .
AGRONOMY FOR SUSTAINABLE DEVELOPMENT, 2012, 32 (04) :841-851
[33]   A conditioned Latin hypercube method for sampling in the presence of ancillary information [J].
Minasny, Budiman ;
McBratney, Alex B. .
COMPUTERS & GEOSCIENCES, 2006, 32 (09) :1378-1388
[34]   Removing the effect of soil moisture from NIR diffuse reflectance spectra for the prediction of soil organic carbon [J].
Minasny, Budiman ;
McBratney, Alex B. ;
Bellon-Maurel, Veronique ;
Roger, Jean-Michel ;
Gobrecht, Alexia ;
Ferrand, Laure ;
Joalland, Samuel .
GEODERMA, 2011, 167-68 :118-124
[35]   Global pedodiversity, taxonomic distance, and the World Reference Base [J].
Minasny, Budiman ;
McBratney, Alex. B. ;
Hartemink, Alfred E. .
GEODERMA, 2010, 155 (3-4) :132-139
[36]   Comparison among principal component, partial least squares and back propagation neural network analyses for accuracy of measurement of selected soil properties with visible and near infrared spectroscopy [J].
Mouazen, A. M. ;
Kuang, B. ;
De Baerdemaeker, J. ;
Ramon, H. .
GEODERMA, 2010, 158 (1-2) :23-31
[37]   On-line vis-NIR spectroscopy prediction of soil organic carbon using machine learning [J].
Nawar, S. ;
Mouazen, A. M. .
SOIL & TILLAGE RESEARCH, 2019, 190 :120-127
[38]   Optimal sample selection for measurement of soil organic carbon using online vis-NIR spectroscopy [J].
Nawar, Said ;
Mouazen, Abdul M. .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2018, 151 :469-477
[39]   Convolutional neural network for simultaneous prediction of several soil properties using visible/near-infrared, mid-infrared, and their combined spectra [J].
Ng, Wartini ;
Minasny, Budiman ;
Montazerolghaem, Maryam ;
Padarian, Jose ;
Ferguson, Richard ;
Bailey, Scarlett ;
McBratney, Alex B. .
GEODERMA, 2019, 352 :251-267
[40]   In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra [J].
Ng, Wartini ;
Minasny, Budiman ;
Malone, Brendan ;
Filippi, Patrick .
PEERJ, 2018, 6