Preconditioned Stochastic Gradient Descent

被引:58
|
作者
Li, Xi-Lin [1 ,2 ,3 ]
机构
[1] Univ Maryland Baltimore Cty, Machine Learning Signal Proc Lab, Baltimore, MD 21228 USA
[2] Fortemedia Inc, Santa Clara, CA USA
[3] Cisco Syst Inc, San Jose, CA USA
关键词
Neural network; Newton method; nonconvex optimization; preconditioner; stochastic gradient descent (SGD);
D O I
10.1109/TNNLS.2017.2672978
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic gradient descent (SGD) still is the workhorse for many practical problems. However, it converges slow, and can be difficult to tune. It is possible to precondition SGD to accelerate its convergence remarkably. But many attempts in this direction either aim at solving specialized problems, or result in significantly more complicated methods than SGD. This paper proposes a new method to adaptively estimate a preconditioner, such that the amplitudes of perturbations of preconditioned stochastic gradient match that of the perturbations of parameters to be optimized in a way comparable to Newton method for deterministic optimization. Unlike the preconditioners based on secant equation fitting as done in deterministic quasi-Newton methods, which assume positive definite Hessian and approximate its inverse, the new preconditioner works equally well for both convex and nonconvex optimizations with exact or noisy gradients. When stochastic gradient is used, it can naturally damp the gradient noise to stabilize SGD. Efficient preconditioner estimation methods are developed, and with reasonable simplifications, they are applicable to large-scale problems. Experimental results demonstrate that equipped with the new preconditioner, without any tuning effort, preconditioned SGD can efficiently solve many challenging problems like the training of a deep neural network or a recurrent neural network requiring extremely long-term memories.
引用
收藏
页码:1454 / 1466
页数:13
相关论文
共 50 条
  • [21] BACKPROPAGATION AND STOCHASTIC GRADIENT DESCENT METHOD
    AMARI, S
    NEUROCOMPUTING, 1993, 5 (4-5) : 185 - 196
  • [22] Randomized Stochastic Gradient Descent Ascent
    Sebbouh, Othmane
    Cuturi, Marco
    Peyre, Gabriel
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [23] Graph Drawing by Stochastic Gradient Descent
    Zheng, Jonathan X.
    Pawar, Samraat
    Goodman, Dan F. M.
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2019, 25 (09) : 2738 - 2748
  • [24] On the discrepancy principle for stochastic gradient descent
    Jahn, Tim
    Jin, Bangti
    INVERSE PROBLEMS, 2020, 36 (09)
  • [25] Nonparametric Budgeted Stochastic Gradient Descent
    Trung Le
    Vu Nguyen
    Tu Dinh Nguyen
    Dinh Phung
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 564 - 572
  • [26] Benign Underfitting of Stochastic Gradient Descent
    Koren, Tomer
    Livni, Roi
    Mansour, Yishay
    Sherman, Uri
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [27] The effective noise of stochastic gradient descent
    Mignacco, Francesca
    Urbani, Pierfrancesco
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2022, 2022 (08):
  • [28] On the regularizing property of stochastic gradient descent
    Jin, Bangti
    Lu, Xiliang
    INVERSE PROBLEMS, 2019, 35 (01)
  • [29] A stochastic multiple gradient descent algorithm
    Mercier, Quentin
    Poirion, Fabrice
    Desideri, Jean-Antoine
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2018, 271 (03) : 808 - 817
  • [30] Efficiency Ordering of Stochastic Gradient Descent
    Hu, Jie
    Doshi, Vishwaraj
    Eun, Do Young
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,