Novel Convergence Results of Adaptive Stochastic Gradient Descents

被引:13
作者
Sun, Tao [1 ]
Qiao, Linbo [1 ]
Liao, Qing [2 ]
Li, Dongsheng [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China
[2] Harbin Inst Technol, Dept Comp Sci & Technol, Shenzhen 518055, Peoples R China
基金
美国国家科学基金会; 国家重点研发计划;
关键词
Convergence; Training; Optimization; Task analysis; Stochastic processes; Adaptive systems; Sun; Adaptive stochastic gradient descent; nonconvexity; acceleration; momentum; nonergodic convergence; NONCONVEX; OPTIMIZATION; MINIMIZATION;
D O I
10.1109/TIP.2020.3038535
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Adaptive stochastic gradient descent, which uses unbiased samples of the gradient with stepsizes chosen from the historical information, has been widely used to train neural networks for computer vision and pattern recognition tasks. This paper revisits the theoretical aspects of two classes of adaptive stochastic gradient descent methods, which contain several existing state-of-the-art schemes. We focus on the presentation of novel findings: In the general smooth case, the nonergodic convergence results are given, that is, the expectation of the gradients' norm rather than the minimum of past iterates is proved to converge; We also studied their performances under Polyak-Lojasiewicz property on the objective function. In this case, the nonergodic convergence rates are given for the expectation of the function values. Our findings show that more substantial restrictions on the steps are needed to guarantee the nonergodic function values' convergence (rates).
引用
收藏
页码:1044 / 1056
页数:13
相关论文
共 50 条
  • [1] On the projected subgradient method for nonsmooth convex optimization in a Hilbert space
    Alber, YI
    Iusem, AN
    Solodov, MV
    [J]. MATHEMATICAL PROGRAMMING, 1998, 81 (01) : 23 - 35
  • [2] [Anonymous], 2019, ARXIV190207111
  • [3] [Anonymous], 2012, TECH REP
  • [4] [Anonymous], 2017, ADV NEURAL INFORM PR
  • [5] [Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.90
  • [6] [Anonymous], 2020, Proceedings of Machine Learning Research,
  • [7] On the convergence of the proximal algorithm for nonsmooth functions involving analytic features
    Attouch, Hedy
    Bolte, Jerome
    [J]. MATHEMATICAL PROGRAMMING, 2009, 116 (1-2) : 5 - 16
  • [8] Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality
    Attouch, Hedy
    Bolte, Jerome
    Redont, Patrick
    Soubeyran, Antoine
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2010, 35 (02) : 438 - 457
  • [9] Barakat A., 2018, ARXIV181002263
  • [10] Barakat A., 2019, OPTIM CONTROL