Geostatistical semi-supervised learning for spatial prediction

被引:4
|
作者
Fouedjio, Francky [1 ]
Talebi, Hassan [2 ]
机构
[1] Rio Tinto, Data & Analyt, 152-158 St Georges Terrace, Perth, WA 6000, Australia
[2] Rio Tinto, Dev & Technol, 152-158 St Georges Terrace, Perth, WA 6000, Australia
来源
ARTIFICIAL INTELLIGENCE IN GEOSCIENCES | 2022年 / 3卷
关键词
Labeled spatial data; Unlabeled spatial data; Spatial autocorrelation; Pseudo labeling; Spatial prediction; REMOTE-SENSING DATA; RANDOM FOREST; CLASSIFICATION; INTERPOLATION; ALGORITHMS; REGION;
D O I
10.1016/j.aiig.2022.12.002
中图分类号
P [天文学、地球科学];
学科分类号
07 ;
摘要
Geoscientists are increasingly tasked with spatially predicting a target variable in the presence of auxiliary information using supervised machine learning algorithms. Typically, the target variable is observed at a few sampling locations due to the relatively time-consuming and costly process of obtaining measurements. In contrast, auxiliary variables are often exhaustively observed within the region under study through the increasing development of remote sensing platforms and sensor networks. Supervised machine learning methods do not fully leverage this large amount of auxiliary spatial data. Indeed, in these methods, the training dataset includes only labeled data locations (where both target and auxiliary variables were measured). At the same time, unlabeled data locations (where auxiliary variables were measured but not the target variable) are not considered during the model training phase. Consequently, only a limited amount of auxiliary spatial data is utilized during the model training stage. As an alternative to supervised learning, semi-supervised learning, which learns from labeled as well as unlabeled data, can be used to address this problem. However, conventional semi-supervised learning techniques do not account for the specificities of spatial data. This paper introduces a spatial semi-supervised learning framework where geostatistics and machine learning are combined to harness a large amount of unlabeled spatial data in combination with typically a smaller set of labeled spatial data. The main idea consists of leveraging the target variable's spatial autocorrelation to generate pseudo labels at unlabeled data points that are geographically close to labeled data points. This is achieved through geostatistical conditional simulation, where an ensemble of pseudo labels is generated to account for the uncertainty in the pseudo labeling process. The observed labels are augmented by this ensemble of pseudo labels to create an ensemble of pseudo training datasets. A supervised machine learning model is then trained on each pseudo training dataset, followed by an aggregation of trained models. The proposed geostatistical semi-supervised learning method is applied to synthetic and real-world spatial datasets. Its predictive performance is compared with some classical supervised and semi-supervised machine learning methods. It appears that it can effectively leverage a large amount of unlabeled spatial data to improve the target variable's spatial prediction.
引用
收藏
页码:162 / 178
页数:17
相关论文
共 50 条
  • [21] Semi-supervised learning of emblematic gestures
    Al-Behadili, Husam
    Woehler, Christian
    Grumpe, Arne
    AT-AUTOMATISIERUNGSTECHNIK, 2014, 62 (10) : 732 - 739
  • [22] RSSL: Semi-supervised Learning in R
    Krijthe, Jesse H.
    REPRODUCIBLE RESEARCH IN PATTERN RECOGNITION, RRPR 2016, 2017, 10214 : 104 - 115
  • [23] Spectral-Spatial Classification of Hyperspectral Images with Semi-Supervised Graph Learning
    Luo, Renbo
    Liao, Wenzhi
    Zhang, Hongyan
    Pi, Youguo
    Philips, Wilfried
    IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXII, 2016, 10004
  • [24] Prediction of combustion state through a semi-supervised learning model and flame imaging
    Han, Zhezhe
    Li, Jian
    Zhang, Biao
    Hossain, Md Moinul
    Xu, Chuanlong
    FUEL, 2021, 289
  • [25] Sample-based software defect prediction with active and semi-supervised learning
    Li, Ming
    Zhang, Hongyu
    Wu, Rongxin
    Zhou, Zhi-Hua
    AUTOMATED SOFTWARE ENGINEERING, 2012, 19 (02) : 201 - 230
  • [26] Online semi-supervised learning with learning vector quantization
    Shen, Yuan-Yuan
    Zhang, Yan-Ming
    Zhang, Xu-Yao
    Liu, Cheng-Lin
    NEUROCOMPUTING, 2020, 399 : 467 - 478
  • [27] Safe semi-supervised learning: a brief introduction
    Li, Yu-Feng
    Liang, De-Ming
    FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (04) : 669 - 676
  • [28] A Probabilistic Contrastive Framework for Semi-Supervised Learning
    Lin, Huibin
    Zhang, Chun-Yang
    Wang, Shiping
    Guo, Wenzhong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8767 - 8779
  • [29] Incremental semi-supervised learning on streaming data
    Li, Yanchao
    Wang, Yongli
    Liu, Qi
    Bi, Cheng
    Jiang, Xiaohui
    Sun, Shurong
    PATTERN RECOGNITION, 2019, 88 : 383 - 396
  • [30] Improved Semi-Supervised Learning with Multiple Graphs
    Viswanathan, Krishnamurthy
    Sachdeva, Sushant
    Tomkins, Andrew
    Ravi, Sujith
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89