Novel Convergence Results of Adaptive Stochastic Gradient Descents

被引：13

作者：

Sun, Tao ^{[1
]}

Qiao, Linbo ^{[1
]}

Liao, Qing ^{[2
]}

Li, Dongsheng ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China

[2] Harbin Inst Technol, Dept Comp Sci & Technol, Shenzhen 518055, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2021年 / 30卷

基金：

美国国家科学基金会; 国家重点研发计划;

关键词：

Convergence; Training; Optimization; Task analysis; Stochastic processes; Adaptive systems; Sun; Adaptive stochastic gradient descent; nonconvexity; acceleration; momentum; nonergodic convergence; NONCONVEX; OPTIMIZATION; MINIMIZATION;

D O I：

10.1109/TIP.2020.3038535

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Adaptive stochastic gradient descent, which uses unbiased samples of the gradient with stepsizes chosen from the historical information, has been widely used to train neural networks for computer vision and pattern recognition tasks. This paper revisits the theoretical aspects of two classes of adaptive stochastic gradient descent methods, which contain several existing state-of-the-art schemes. We focus on the presentation of novel findings: In the general smooth case, the nonergodic convergence results are given, that is, the expectation of the gradients' norm rather than the minimum of past iterates is proved to converge; We also studied their performances under Polyak-Lojasiewicz property on the objective function. In this case, the nonergodic convergence rates are given for the expectation of the function values. Our findings show that more substantial restrictions on the steps are needed to guarantee the nonergodic function values' convergence (rates).

引用

页码：1044 / 1056

页数：13

共 50 条

[1] On the projected subgradient method for nonsmooth convex optimization in a Hilbert space
Alber, YI
Iusem, AN
Solodov, MV
[J]. MATHEMATICAL PROGRAMMING, 1998, 81 (01) : 23 - 35
[2] [Anonymous], 2019, ARXIV190207111
[3] [Anonymous], 2012, TECH REP
[4] [Anonymous], 2017, ADV NEURAL INFORM PR
[5] [Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.90
[6] [Anonymous], 2020, Proceedings of Machine Learning Research,
[7] On the convergence of the proximal algorithm for nonsmooth functions involving analytic features
Attouch, Hedy
Bolte, Jerome
[J]. MATHEMATICAL PROGRAMMING, 2009, 116 (1-2) : 5 - 16
[8] Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality
Attouch, Hedy
Bolte, Jerome
Redont, Patrick
Soubeyran, Antoine
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2010, 35 (02) : 438 - 457
[9] Barakat A., 2018, ARXIV181002263
[10] Barakat A., 2019, OPTIM CONTROL

← 1 2 3 4 5 →