Improving the predictions of soil properties from VNIR-SWIR spectra in an unlabeled region using semi-supervised and active learning

被引:12
作者
Tsakiridis, Nikolaos L. [1 ]
Theocharis, John B. [1 ]
Symeonidis, Andreas L. [1 ]
Zalidis, George C. [2 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Elect & Comp Engn, Thessaloniki 54124, Greece
[2] Aristotle Univ Thessaloniki, Fac Agr, Thessaloniki 54124, Greece
基金
欧盟地平线“2020”;
关键词
Soil spectroscopy; Spiking; Active learning; Semi-supervised learning; vis-NIR; NEAR-INFRARED SPECTROSCOPY; ORGANIC-CARBON; LOCAL SCALE; NIR; CALIBRATIONS; SPIKING; MODEL; INFORMATION; CHEMISTRY; LIBRARIES;
D O I
10.1016/j.geoderma.2020.114830
中图分类号
S15 [土壤学];
学科分类号
0903 ; 090301 ;
摘要
Monitoring the status of the soil ecosystem to identify the spatio-temporal extent of the pressures exerted and mitigate the effects of climate change and land degradation necessitates the need for reliable and cost-effective solutions. To address this need, soil spectroscopy in the visible, near- and shortwave-infrared (VNIR-SWIR) has emerged as a viable alternative to traditional analytical approaches. To this end, large-scale soil spectral libraries coupled with advanced machine learning tools have been developed to infer the soil properties from the hyperspectral signatures. However, models developed from one region may exhibit diminished performance when applied to a new, unseen by the model, region due to the large and inherent soil variability (e.g. pedogenetical differences, diverse soil types etc.). Given an existing spectral library with labeled data and a new unlabeled region (i.e. where no soil samples are analytically measured) the question then becomes how to best develop a model which can more accurately predict the soil properties of the unlabeled region. In this paper, a machine learning technique leveraging on the capabilities of semi-supervised learning which exploits the predictors' distribution of the unlabeled dataset and of active learning which expertly selects a small set of data from the unlabeled dataset as a spiking subset in order to develop a more robust model is proposed. The semi-supervised learning approach is the Laplacian Support Vector Regression following the manifold regularization framework. As far as the active learning component is concerned, the pool-based approach is utilized as it best matches with the aforementioned use-case scenario, which iteratively selects a subset of data from the unlabeled region to spike the calibration set. As a query strategy, a novel machine learning-based strategy is proposed herein to best identify the spiking subset at each iteration. The experimental analysis was conducted using data from the Land Use and Coverage Area Frame Survey of 2009 which covered most of the then memberstates of the European Union, and in particular by focusing on the mineral cropland soil samples from 5 different countries. The statistical analysis conducted ascertained the efficacy of our approach when compared to the current state-of-the-art in soil spectroscopy.
引用
收藏
页数:17
相关论文
共 59 条
[1]   A rewriting system for convex optimization problems [J].
Agrawal, Akshay ;
Verschueren, Robin ;
Diamond, Steven ;
Boyd, Stephen .
Journal of Control and Decision, 2018, 5 (01) :42-60
[2]   Soil and human security in the 21st century [J].
Amundson, Ronald ;
Berhe, Asmeret Asefaw ;
Hopmans, Jan W. ;
Olson, Carolyn ;
Sztein, A. Ester ;
Sparks, Donald L. .
SCIENCE, 2015, 348 (6235)
[3]  
Anjos L., 2015, World Reference Base for Soil Resources 2014 Soil Classification System for Naming Soils and Creating Legends for Soil Maps
[4]  
Belkin M, 2006, J MACH LEARN RES, V7, P2399
[5]   Near-infrared (NIR) and mid-infrared (MIR) spectroscopic techniques for assessing the amount of carbon stock in soils - Critical review and research perspectives [J].
Bellon-Maurel, Veronique ;
McBratney, Alex .
SOIL BIOLOGY & BIOCHEMISTRY, 2011, 43 (07) :1398-1410
[6]   FCM - THE FUZZY C-MEANS CLUSTERING-ALGORITHM [J].
BEZDEK, JC ;
EHRLICH, R ;
FULL, W .
COMPUTERS & GEOSCIENCES, 1984, 10 (2-3) :191-203
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Estimation of soil organic carbon in arable soil in Belgium and Luxembourg with the LUCAS topsoil database [J].
Castaldi, F. ;
Chabrillat, S. ;
Chartin, C. ;
Genot, V. ;
Jones, A. R. ;
van Wesemael, B. .
EUROPEAN JOURNAL OF SOIL SCIENCE, 2018, 69 (04) :592-603
[9]   Evaluating the capability of the Sentinel 2 data for soil organic carbon prediction in croplands [J].
Castaldi, Fabio ;
Hueni, Andreas ;
Chabrillat, Sabine ;
Ward, Kathrin ;
Buttafuoco, Gabriele ;
Bomans, Bart ;
Vreys, Kristin ;
Brell, Maximilian ;
van Wesemael, Bas .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2019, 147 :267-282
[10]   Active learning with statistical models [J].
Cohn, DA ;
Ghahramani, Z ;
Jordan, MI .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :129-145