Projected sequential Gaussian processes: A C plus plus tool for interpolation of large datasets with heterogeneous noise

被引:5
作者
Barillec, Remi [1 ]
Ingram, Ben [2 ,3 ]
Cornford, Dan [1 ]
Csato, Lehel [4 ]
机构
[1] Aston Univ, Nonlinear & Complex Res Grp, Birmingham B4 7ET, W Midlands, England
[2] Univ Talca, Fac Ingn, Curico, Chile
[3] Univ Talca, Ctr Geomat, Curico, Chile
[4] Univ Babes Bolyai, Fac Math & Informat, RO-400084 Cluj Napoca, Romania
基金
英国工程与自然科学研究理事会;
关键词
Low-rank approximations; Sensor fusion; Heterogeneous data; ONLINE;
D O I
10.1016/j.cageo.2010.05.008
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Heterogeneous datasets arise naturally in most applications due to the use of a variety of sensors and measuring platforms. Such datasets can be heterogeneous in terms of the error characteristics and sensor models. Treating such data is most naturally accomplished using a Bayesian or model-based geostatistical approach: however, such methods generally scale rather badly with the size of dataset, and require computationally expensive Monte Carlo based inference. Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential Bayesian framework for inference in such projected processes is presented. The observations are considered one at a time which avoids the need for high dimensional integrals typically required in a Bayesian approach. A C++ library, gptk, which is part of the INTAMAP web service, is introduced which implements projected, sequential estimation and adds several novel features. In particular the library includes the ability to use a generic observation operator, or sensor model, to permit data fusion. It is also possible to cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the covariance parameters is explored, including the impact of the projected process approximation on likelihood profiles. We illustrate the projected sequential method in application to synthetic and real datasets. Limitations and extensions are discussed. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:295 / 309
页数:15
相关论文
共 38 条
[1]  
ANDERSON E, 1999, LAPACK USERS GUIDE S, P429
[2]   Stationary process approximation for the analysis of large spatial datasets [J].
Banerjee, Sudipto ;
Gelfand, Alan E. ;
Finley, Andrew O. ;
Sang, Huiyan .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :825-848
[3]   An updated set of Basic Linear Algebra Subprograms (BLAS) [J].
Blackford, LS ;
Demmel, J ;
Dongarra, J ;
Duff, I ;
Hammarling, S ;
Henry, G ;
Heroux, M ;
Kaufman, L ;
Lumsdaine, A ;
Petitet, A ;
Pozo, R ;
Remington, K ;
Whaley, RC .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2002, 28 (02) :135-151
[4]   Error analysis for tropospheric NO2 retrieval from space -: art. no. D04311 [J].
Boersma, KF ;
Eskes, HJ ;
Brinksma, EJ .
JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES, 2004, 109 (D4)
[5]   Enhancing spatial estimates of metal pollutants in raw wastewater irrigated fields using a topsoil organic carbon map predicted from aerial photography [J].
Bourennane, H. ;
Dere, Ch. ;
Lamy, I. ;
Cornu, S. ;
Baize, D. ;
van Oort, F. ;
King, D. .
SCIENCE OF THE TOTAL ENVIRONMENT, 2006, 361 (1-3) :229-248
[6]   Population Monte Carlo [J].
Cappé, O ;
Guillin, A ;
Marin, JM ;
Robert, CP .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2004, 13 (04) :907-929
[7]  
COLLIER CG, 1989, APPL WEATHER RADAR S, P294
[8]   Sequential, Bayesian geostatistics:: A principled method for large data sets [J].
Cornford, D ;
Csató, L ;
Opper, M .
GEOGRAPHICAL ANALYSIS, 2005, 37 (02) :183-199
[9]  
Cover T.M., 2006, ELEMENTS INFORM THEO, V2nd ed
[10]  
COX S, 2007, OBSERVATIONS MEASU 1, P85