A Case Study Competition Among Methods for Analyzing Large Spatial Data

被引:249
作者
Heaton, Matthew J. [1 ]
Datta, Abhirup [1 ]
Finley, Andrew O. [1 ]
Furrer, Reinhard [1 ]
Guinness, Joseph [1 ]
Guhaniyogi, Rajarshi [1 ]
Gerber, Florian [1 ]
Gramacy, Robert B. [1 ]
Hammerling, Dorit [1 ]
Katzfuss, Matthias [1 ]
Lindgren, Finn [1 ]
Nychka, Douglas W. [1 ]
Sun, Furong [1 ]
Zammit-Mangion, Andrew [1 ]
机构
[1] Brigham Young Univ, Provo, UT 84602 USA
基金
美国国家航空航天局; 美国国家科学基金会; 澳大利亚研究理事会;
关键词
Big data; Gaussian process; Parallel computing; Low-rank approximation; GAUSSIAN PROCESS MODELS; STOCHASTIC-APPROXIMATION; ASYMPTOTIC PROPERTIES; PARAMETER-ESTIMATION; STATIONARY PROCESS; LIKELIHOOD; PREDICTION; CLUSTERS; FIELDS;
D O I
10.1007/s13253-018-00348-w
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The Gaussian process is an indispensable tool for spatial data analysts. The onset of the "big data" era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online.
引用
收藏
页码:398 / 425
页数:28
相关论文
共 89 条
[21]  
DAHLHAUS R, 1987, BIOMETRIKA, V74, P877
[22]   NONSEPARABLE DYNAMIC NEAREST NEIGHBOR GAUSSIAN PROCESS MODELS FOR LARGE SPATIO-TEMPORAL DATA WITH AN APPLICATION TO PARTICULATE MATTER ANALYSIS [J].
Datta, Abhirup ;
Banerjee, Sudipto ;
Finley, Andrew O. ;
Hamm, Nicholas A. S. ;
Schaap, Martijn .
ANNALS OF APPLIED STATISTICS, 2016, 10 (03) :1286-1316
[23]   On nearest-neighbor Gaussian process models for massive spatial data [J].
Datta, Abhirup ;
Banerjee, Sudipto ;
Finley, Andrew O. ;
Gelfand, Alan E. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2016, 8 (05) :162-171
[24]   Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets [J].
Datta, Abhirup ;
Banerjee, Sudipto ;
Finley, Andrew O. ;
Gelfand, Alan E. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (514) :800-812
[25]   FIXED-DOMAIN ASYMPTOTIC PROPERTIES OF TAPERED MAXIMUM LIKELIHOOD ESTIMATORS [J].
Du, Juan ;
Zhang, Hao ;
Mandrekar, V. S. .
ANNALS OF STATISTICS, 2009, 37 (6A) :3330-3361
[26]   Estimation and Prediction in Spatial Models With Block Composite Likelihoods [J].
Eidsvik, Jo ;
Shaby, Benjamin A. ;
Reich, Brian J. ;
Wheeler, Matthew ;
Niemi, Jarad .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (02) :295-315
[27]   The kriging update equations and their application to the selection of neighboring data [J].
Emery, Xavier .
COMPUTATIONAL GEOSCIENCES, 2009, 13 (03) :269-280
[28]  
Finley A. O., 2018, ARXIV170200434
[29]  
Finley Andrew., 2017, spNNGP: Spatial Regression Models for Large Datasets using Nearest Neighbor Gaussian Processes
[30]   Improving the performance of predictive process modeling for large datasets [J].
Finley, Andrew O. ;
Sang, Huiyan ;
Banerjee, Sudipto ;
Gelfand, Alan E. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (08) :2873-2884