Laplacian smoothing gradient descent

被引:0
作者
Stanley Osher
Bao Wang
Penghang Yin
Xiyang Luo
Farzin Barekat
Minh Pham
Alex Lin
机构
[1] UCLA,Department of Mathematics
[2] University of Utah,Department of Mathematics, Scientific Computing and Imaging Institute
[3] SUNY Albany,Department of Mathematics
来源
Research in the Mathematical Sciences | 2022年 / 9卷
关键词
Laplacian smoothing; Machine learning; Optimization;
D O I
暂无
中图分类号
学科分类号
摘要
We propose a class of very simple modifications of gradient descent and stochastic gradient descent leveraging Laplacian smoothing. We show that when applied to a large variety of machine learning problems, ranging from logistic regression to deep neural nets, the proposed surrogates can dramatically reduce the variance, allow to take a larger step size, and improve the generalization accuracy. The methods only involve multiplying the usual (stochastic) gradient by the inverse of a positive definitive matrix (which can be computed efficiently by FFT) with a low condition number coming from a one-dimensional discrete Laplacian or its high-order generalizations. Given any vector, e.g., gradient vector, Laplacian smoothing preserves the mean and increases the smallest component and decreases the largest component. Moreover, we show that optimization algorithms with these surrogates converge uniformly in the discrete Sobolev Hσp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_\sigma ^p$$\end{document} sense and reduce the optimality gap for convex optimization problems. The code is available at: https://github.com/BaoWangMath/LaplacianSmoothing-GradientDescent.
引用
收藏
相关论文
共 21 条
  • [1] Allen-Zhu Z(2018)Katyusha: The first direct acceleration of stochastic gradient methods J. Mach. Learn. Res. 18 1-51
  • [2] Bottou L(2018)Optimization methods for large-scale machine learning SIAM Rev. 60 223-311
  • [3] Curtis EF(2011)Adaptive subgradient methods for online learning and stochastic optimization J. Mach. Learn. Res. 12 2121-2159
  • [4] Nocedal J(1998)Gradient-based learning applied to document recognition Proc. IEEE 81 2278-2324
  • [5] Duchi J(2017)Stochastic gradient descent as approximate bayesian inference J. Mach. Learn. Res. 18 1-35
  • [6] Hazan E(1983)A method for solving the convex programming problem with convergence rate Dokl. Akad. Nauk SSSR 269 543-547
  • [7] Singer Y(1999)On the momentum term in gradient descent learning algorithms Neural Networks 12 145-151
  • [8] LeCun Y(1951)A stochastic approximation method Ann. Math. Stat. 22 400-407
  • [9] Bottou L(1996)Convergence analysis of gradient descent stochastic algorithms J. Optim. Theory Appl. 91 439-454
  • [10] Bengio Y(2016)Mastering the game of go with deep neural networks and tree search Nature 529 484-489