Traditional kriging versus modern Gaussian processes for large-scale mining data

被引:7
作者
Christianson, Ryan B. [1 ]
Pollyea, Ryan M. [2 ]
Gramacy, Robert B. [3 ]
机构
[1] Univ Chicago, Dept Stat & Data Sci, NORC, 55 E Monroe St,30th Floor, Chicago, IL 60603 USA
[2] Virginia Tech, Dept Geosci, Blacksburg, VA USA
[3] Virginia Tech, Dept Stat, Blacksburg, VA USA
来源
STATISTICAL ANALYSIS AND DATA MINING-AN ASA DATA SCIENCE JOURNAL | 2023年 / 16卷 / 05期
基金
美国国家科学基金会;
关键词
Gaussian process regression; multiple imputation; ordinary kriging; surrogate modeling; variogram; Vecchia approximation; PREDICTION;
D O I
10.1002/sam.11635
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The canonical technique for nonlinear modeling of spatial/point-referenced data is known as kriging in geostatistics, and as Gaussian Process (GP) regression for surrogate modeling and statistical learning. This article reviews many similarities shared between kriging and GPs, but also highlights some important differences. One is that GPs impose a process that can be used to automate kernel/variogram inference, thus removing the human from the loop. The GP framework also suggests a probabilistically valid means of scaling to handle a large corpus of training data, that is, an alternative to ordinary kriging. Finally, recent GP implementations are tailored to make the most of modern computing architectures, such as multi-core workstations and multi-node supercomputers. We argue that such distinctions are important even in classically geostatistical settings. To back that up, we present out-of-sample validation exercises using two, real, large-scale borehole data sets acquired in the mining of gold and other minerals. We compare classic kriging with several variations of modern GPs and conclude that the latter is more economical (fewer human and compute resources), more accurate and offers better uncertainty quantification. We go on to show how the fully generative modeling apparatus provided by GPs can gracefully accommodate left-censoring of small measurements, as commonly occurs in mining data and other borehole assays.
引用
收藏
页码:488 / 506
页数:19
相关论文
共 52 条
  • [1] Abrahamsen P., 1997, A review of gaussian random fields and correlation functions, DOI DOI 10.13140/RG.2.2.23937.20325
  • [2] Banerjee S., 2017, GEOSTATISTICAL MODEL, P81
  • [3] Casella G., 2001, Statistical Inference
  • [4] Casella G., 2004, Institute of Mathematical Statistics Lecture Notes-Monograph Series, V45, P342, DOI DOI 10.1214/LNMS/1196285403
  • [5] Neural network exploration using optimal experiment design
    Cohn, DA
    [J]. NEURAL NETWORKS, 1996, 9 (06) : 1071 - 1083
  • [6] FITTING VARIOGRAM MODELS BY WEIGHTED LEAST-SQUARES
    CRESSIE, N
    [J]. JOURNAL OF THE INTERNATIONAL ASSOCIATION FOR MATHEMATICAL GEOLOGY, 1985, 17 (05): : 563 - 586
  • [7] Cressie N., 1993, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, DOI DOI 10.1002/9781119115151
  • [8] Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets
    Datta, Abhirup
    Banerjee, Sudipto
    Finley, Andrew O.
    Gelfand, Alan E.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (514) : 800 - 812
  • [9] Diggle PJ, 2007, SPRINGER SER STAT, P1, DOI 10.1007/978-0-387-48536-2
  • [10] Strictly proper scoring rules, prediction, and estimation
    Gneiting, Tilmann
    Raftery, Adrian E.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2007, 102 (477) : 359 - 378