Provable Super-Convergence With a Large Cyclical Learning Rate

被引:6
作者
Oymak, Samet [1 ]
机构
[1] Univ Calif Riverside, Dept Elect Engn, Riverside, CA 92507 USA
关键词
Eigenvalues and eigenfunctions; Convergence; Jacobian matrices; Standards; Deep learning; Signal processing algorithms; Schedules; Convergence of numerical methods; Iterative algorithms; Gradient methods;
D O I
10.1109/LSP.2021.3101131
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Conventional wisdom dictates that learning rate should be in the stable regime so that gradient-based algorithms don't blow up. This letter introduces a simple scenario where an unstably large learning rate scheme leads to a super fast convergence, with the convergence rate depending only logarithmically on the condition number of the problem. Our scheme uses a Cyclical Learning Rate where we periodically take one large unstable step and several small stable steps to compensate for the instability. These findings also help explain the empirical observations of [Smith and Topin, 2019] where they show that CLR with a large maximum learning rate can dramatically accelerate learning and lead to so-called "super-convergence". We prove that our scheme excels in the problems where Hessian exhibits a bimodal spectrum and the eigenvalues can be grouped into two clusters (small and large). The unstably large step is the key to enabling fast convergence over the small eigen-spectrum.
引用
收藏
页码:1645 / 1649
页数:5
相关论文
共 25 条
  • [1] [Anonymous], 2020, J STAT MECH-THEORY E, DOI DOI 10.1088/1742-5468/abc62b
  • [2] Belkin M., 2018, Advances in Neural Information Processing Systems, DOI DOI 10.5555/3327144.3327157
  • [3] Reconciling modern machine-learning practice and the classical bias-variance trade-off
    Belkin, Mikhail
    Hsu, Daniel
    Ma, Siyuan
    Mandal, Soumik
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (32) : 15849 - 15854
  • [4] Cohen J. M., IN PRESS, P2021
  • [5] Daneshmand H, 2018, PR MACH LEARN RES, V80
  • [6] Fu H, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P240
  • [7] Gur-Ari Guy, 2018, ARXIV181204754
  • [8] Izmailov P, 2018, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, P876
  • [9] Jacot A, 2018, 32 C NEURAL INFORM P
  • [10] Karimi Hamed, 2016, JOINT EUROPEAN C MAC, P795