Efficient Gaussian process regression for large datasets

被引:99
作者
Banerjee, Anjishnu [1 ]
Dunson, David B. [1 ]
Tokdar, Surya T. [1 ]
机构
[1] Duke Univ, Dept Stat Sci, Durham, NC 27708 USA
关键词
Bayesian regression; Compressive sensing; Dimensionality reduction; Gaussian process; Random projection; APPROXIMATION; ALGORITHMS; MATRIX;
D O I
10.1093/biomet/ass068
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gaussian processes are widely used in nonparametric regression, classification and spatiotemporal modelling, facilitated in part by a rich literature on their theoretical properties. However, one of their practical limitations is expensive computation, typically on the order of n(3) where n is the number of data points, in performing the necessary matrix inversions. For large datasets, storage and processing also lead to computational bottlenecks, and numerical stability of the estimates and predicted values degrades with increasing n. Various methods have been proposed to address these problems, including predictive processes in spatial data analysis and the subset-of-regressors technique in machine learning. The idea underlying these approaches is to use a subset of the data, but this raises questions concerning sensitivity to the choice of subset and limitations in estimating fine-scale structure in regions that are not well covered by the subset. Motivated by the literature on compressive sensing, we propose an alternative approach that involves linear projection of all the data points onto a lower-dimensional subspace. We demonstrate the superiority of this approach from a theoretical perspective and through simulated and real data examples.
引用
收藏
页码:75 / 89
页数:15
相关论文
共 36 条
[1]  
Adler R.J., 1990, IMS Lecture Notes-Monograph Series, V12, P75
[2]  
[Anonymous], 2000, INT C MACH LEARN
[3]  
[Anonymous], 1999, INTERPOLATION SPATIA
[4]  
[Anonymous], 2012, R LANG ENV STAT COMP
[5]   Stationary process approximation for the analysis of large spatial datasets [J].
Banerjee, Sudipto ;
Gelfand, Alan E. ;
Finley, Andrew O. ;
Sang, Huiyan .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :825-848
[6]  
Bhatia R., 2013, MATRIX ANAL
[7]   Robust uncertainty principles:: Exact signal reconstruction from highly incomplete frequency information [J].
Candès, EJ ;
Romberg, J ;
Tao, T .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (02) :489-509
[8]   On posterior consistency in nonparametric regression problems [J].
Choi, Taeryon ;
Schervish, Mark J. .
JOURNAL OF MULTIVARIATE ANALYSIS, 2007, 98 (10) :1969-1987
[9]   Fixed rank kriging for very large spatial data sets [J].
Cressie, Noel ;
Johannesson, Gardar .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :209-226
[10]   Sparse on-line Gaussian processes [J].
Csató, L ;
Opper, M .
NEURAL COMPUTATION, 2002, 14 (03) :641-668