Provable Super-Convergence With a Large Cyclical Learning Rate

被引：6

作者：

Oymak, Samet ^{[1
]}

机构：

[1] Univ Calif Riverside, Dept Elect Engn, Riverside, CA 92507 USA

来源：

IEEE SIGNAL PROCESSING LETTERS | 2021年 / 28卷 / 28期

关键词：

Eigenvalues and eigenfunctions; Convergence; Jacobian matrices; Standards; Deep learning; Signal processing algorithms; Schedules; Convergence of numerical methods; Iterative algorithms; Gradient methods;

D O I：

10.1109/LSP.2021.3101131

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Conventional wisdom dictates that learning rate should be in the stable regime so that gradient-based algorithms don't blow up. This letter introduces a simple scenario where an unstably large learning rate scheme leads to a super fast convergence, with the convergence rate depending only logarithmically on the condition number of the problem. Our scheme uses a Cyclical Learning Rate where we periodically take one large unstable step and several small stable steps to compensate for the instability. These findings also help explain the empirical observations of [Smith and Topin, 2019] where they show that CLR with a large maximum learning rate can dramatically accelerate learning and lead to so-called "super-convergence". We prove that our scheme excels in the problems where Hessian exhibits a bimodal spectrum and the eigenvalues can be grouped into two clusters (small and large). The unstably large step is the key to enabling fast convergence over the small eigen-spectrum.

引用

页码：1645 / 1649

页数：5

共 25 条

[11] Neural Spectrum Alignment: Empirical Study
Kopitkov, Dmitry
Indelman, Vadim
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 168 - 179
[12] ImageNet Classification with Deep Convolutional Neural Networks
Krizhevsky, Alex
Sutskever, Ilya
Hinton, Geoffrey E.
[J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
[13] Leclerc G., 2020, ARXIV200210376
[14] Li M, 2020, PR MACH LEARN RES, V108, P4313
[15] Li X., 2020, P INT C MACH LEARN
[16] Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
Li, Xinyan
Gu, Qilong
Zhou, Yingxue
Chen, Tiancong
Banerjee, Arindam
[J]. PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 190 - 198
[17] Loshchilov I., 2017, ARXIV160803983, DOI DOI 10.48550/ARXIV.1608.03983
[18] Oymak Samet, 2019, ARXIV190605392
[19] Paul D, 2007, STAT SINICA, V17, P1617
[20] Poggio T., 2017, Theory of Deep Learning III: explaining the non-overfitting puzzle

← 1 2 3 →