Novel Convergence Results of Adaptive Stochastic Gradient Descents

被引：13

作者：

Sun, Tao ^{[1
]}

Qiao, Linbo ^{[1
]}

Liao, Qing ^{[2
]}

Li, Dongsheng ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China

[2] Harbin Inst Technol, Dept Comp Sci & Technol, Shenzhen 518055, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2021年 / 30卷

基金：

美国国家科学基金会; 国家重点研发计划;

关键词：

Convergence; Training; Optimization; Task analysis; Stochastic processes; Adaptive systems; Sun; Adaptive stochastic gradient descent; nonconvexity; acceleration; momentum; nonergodic convergence; NONCONVEX; OPTIMIZATION; MINIMIZATION;

D O I：

10.1109/TIP.2020.3038535

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Adaptive stochastic gradient descent, which uses unbiased samples of the gradient with stepsizes chosen from the historical information, has been widely used to train neural networks for computer vision and pattern recognition tasks. This paper revisits the theoretical aspects of two classes of adaptive stochastic gradient descent methods, which contain several existing state-of-the-art schemes. We focus on the presentation of novel findings: In the general smooth case, the nonergodic convergence results are given, that is, the expectation of the gradients' norm rather than the minimum of past iterates is proved to converge; We also studied their performances under Polyak-Lojasiewicz property on the objective function. In this case, the nonergodic convergence rates are given for the expectation of the function values. Our findings show that more substantial restrictions on the steps are needed to guarantee the nonergodic function values' convergence (rates).

引用

页码：1044 / 1056

页数：13

共 50 条

[21] Dozat T., 2016, P ICLR WORKSH
[22] Duchi J, 2011, J MACH LEARN RES, V12, P2121
[23] Foster D.J., 2018, ADV NEURAL INFORM PR, P8745
[24] STOCHASTIC FIRST- AND ZEROTH-ORDER METHODS FOR NONCONVEX STOCHASTIC PROGRAMMING
Ghadimi, Saeed
Lan, Guanghui
[J]. SIAM JOURNAL ON OPTIMIZATION, 2013, 23 (04) : 2341 - 2368
[25] Karimi Hamed, 2016, JOINT EUROPEAN C MAC, P795
[26] Kasai H, 2019, PR MACH LEARN RES, V97
[27] Keskar Nitish Shirish, 2016, INT C LEARN REPR
[28] Kingma DP, 2014, ADV NEUR IN, V27
[29] Krizhevsky A., 2009, .. Sci. Dep. Univ. Toronto
[30] Gradient-based learning applied to document recognition
Lecun, Y
Bottou, L
Bengio, Y
Haffner, P
[J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324

← 1 2 3 4 5 →