Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling

被引:0
作者
Babichev, Dmitry [1 ]
Bach, Francis [1 ]
机构
[1] PSL Res Univ, INRIA ENS, Paris, France
来源
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE | 2018年
基金
欧洲研究理事会; 欧盟地平线“2020”;
关键词
APPROXIMATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic gradient methods enable learning probabilistic models from large amounts of data. While large step-sizes (learning rates) have shown to be best for least-squares (e.g., Gaussian noise) once combined with parameter averaging, these are not leading to convergent algorithms in general. In this paper, we consider generalized linear models, that is, conditional models based on exponential families. We propose averaging moment parameters instead of natural parameters for constant-step-size stochastic gradient descent. For finite-dimensional models, we show that this can sometimes (and surprisingly) lead to better predictions than the best linear model. For infinite-dimensional models, we show that it always converges to optimal predictions, while averaging natural parameters never does. We illustrate our findings with simulations on synthetic data and classical benchmarks with many observations.
引用
收藏
页码:219 / 228
页数:10
相关论文
共 25 条
  • [1] [Anonymous], 2001, CONDITIONAL RANDOM F
  • [2] [Anonymous], 2001, Learning with Kernels |
  • [3] [Anonymous], 2004, KERNEL METHODS PATTE
  • [4] [Anonymous], 2016, DEEP LEARNING
  • [5] [Anonymous], 160604838 ARXIV
  • [6] [Anonymous], 2017, ADV NEURAL INFORM PR
  • [7] [Anonymous], 2012, MACHINE LEARNING PRO
  • [8] Bach F., 2013, ADV NEURAL INFORM PR, V26
  • [9] Bach F., 2011, ADV NIPS
  • [10] Bach F., 2013, PMLR, P185