Provable Super-Convergence With a Large Cyclical Learning Rate

被引:6
作者
Oymak, Samet [1 ]
机构
[1] Univ Calif Riverside, Dept Elect Engn, Riverside, CA 92507 USA
关键词
Eigenvalues and eigenfunctions; Convergence; Jacobian matrices; Standards; Deep learning; Signal processing algorithms; Schedules; Convergence of numerical methods; Iterative algorithms; Gradient methods;
D O I
10.1109/LSP.2021.3101131
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Conventional wisdom dictates that learning rate should be in the stable regime so that gradient-based algorithms don't blow up. This letter introduces a simple scenario where an unstably large learning rate scheme leads to a super fast convergence, with the convergence rate depending only logarithmically on the condition number of the problem. Our scheme uses a Cyclical Learning Rate where we periodically take one large unstable step and several small stable steps to compensate for the instability. These findings also help explain the empirical observations of [Smith and Topin, 2019] where they show that CLR with a large maximum learning rate can dramatically accelerate learning and lead to so-called "super-convergence". We prove that our scheme excels in the problems where Hessian exhibits a bimodal spectrum and the eigenvalues can be grouped into two clusters (small and large). The unstably large step is the key to enabling fast convergence over the small eigen-spectrum.
引用
收藏
页码:1645 / 1649
页数:5
相关论文
共 25 条
  • [11] Neural Spectrum Alignment: Empirical Study
    Kopitkov, Dmitry
    Indelman, Vadim
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 168 - 179
  • [12] ImageNet Classification with Deep Convolutional Neural Networks
    Krizhevsky, Alex
    Sutskever, Ilya
    Hinton, Geoffrey E.
    [J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
  • [13] Leclerc G., 2020, ARXIV200210376
  • [14] Li M, 2020, PR MACH LEARN RES, V108, P4313
  • [15] Li X., 2020, P INT C MACH LEARN
  • [16] Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
    Li, Xinyan
    Gu, Qilong
    Zhou, Yingxue
    Chen, Tiancong
    Banerjee, Arindam
    [J]. PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 190 - 198
  • [17] Loshchilov I., 2017, ARXIV160803983, DOI DOI 10.48550/ARXIV.1608.03983
  • [18] Oymak Samet, 2019, ARXIV190605392
  • [19] Paul D, 2007, STAT SINICA, V17, P1617
  • [20] Poggio T., 2017, Theory of Deep Learning III: explaining the non-overfitting puzzle