GoGP: scalable geometric-based Gaussian process for online regression

被引:0
作者
Trung Le
Khanh Nguyen
Vu Nguyen
Tu Dinh Nguyen
Dinh Phung
机构
[1] Monash University,Faculty of Information Technology
[2] Deakin University,Centre for Pattern Recognition and Data Analytics, School of Information Technology
来源
Knowledge and Information Systems | 2019年 / 60卷
关键词
Gaussian process; Online learning; Kernel methods; Random feature; Regression;
D O I
暂无
中图分类号
学科分类号
摘要
One of the most challenging problems in Gaussian process regression is to cope with large-scale datasets and to tackle an online learning setting where data instances arrive irregularly and continuously. In this paper, we introduce a novel online Gaussian process model that scales efficiently with large-scale datasets. Our proposed GoGP is constructed based on the geometric and optimization views of the Gaussian process regression, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always offers a sparse solution, which can approximate the true optima up to any level of precision specified a priori. Moreover, to further speed up the GoGP accompanied with a positive semi-definite and shift-invariant kernel such as the well-known Gaussian kernel and also address the curse of kernelization problem, wherein the model size linearly rises with data size accumulated over time in the context of online learning, we proposed to approximate the original kernel using the Fourier random feature kernel. The model of GoGP with Fourier random feature (i.e., GoGP-RF) can be stored directly in a finite-dimensional random feature space, hence being able to avoid the curse of kernelization problem and scalable efficiently and effectively with large-scale datasets. We extensively evaluated our proposed methods against the state-of-the-art baselines on several large-scale datasets for online regression task. The experimental results show that our GoGP(s) delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared with its rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.
引用
收藏
页码:197 / 226
页数:29
相关论文
共 43 条
  • [1] Cavallanti G(2007)Tracking the best hyperplane with a simple budget perceptron Mach Learn 69 143-167
  • [2] Cesa-Bianchi N(2006)Online passive-aggressive algorithms J Mach Learn Res 7 551-585
  • [3] Gentile C(2002)Sparse on-line Gaussian processes Neural Comput 14 641-668
  • [4] Crammer K(1999)Large margin classification using the perceptron algorithm Mach Learn 37 277-296
  • [5] Dekel O(2013)Stochastic variational inference J Mach Learn Res 14 1303-1347
  • [6] Keshet J(2004)Online learning with kernels IEEE Trans Signal Process 52 2165-2176
  • [7] Shalev-Shwartz S(2003)Fast sparse Gaussian process methods: the informative vector machine Advances in neural information processing systems 17 625-632
  • [8] Singer Y(2017)Approximation vector machines for large-scale online learning J Mach Learn Res 18 3962-4016
  • [9] Csató L(2015)Large scale online kernel learning J Mach Learn Res 17 1613-1655
  • [10] Opper M(2009)Bounded kernel-based online learning J Mach Learn Res 10 2643-2666