Provable Super-Convergence With a Large Cyclical Learning Rate

被引：6

作者：

Oymak, Samet ^{[1
]}

机构：

[1] Univ Calif Riverside, Dept Elect Engn, Riverside, CA 92507 USA

来源：

IEEE SIGNAL PROCESSING LETTERS | 2021年 / 28卷 / 28期

关键词：

Eigenvalues and eigenfunctions; Convergence; Jacobian matrices; Standards; Deep learning; Signal processing algorithms; Schedules; Convergence of numerical methods; Iterative algorithms; Gradient methods;

D O I：

10.1109/LSP.2021.3101131

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Conventional wisdom dictates that learning rate should be in the stable regime so that gradient-based algorithms don't blow up. This letter introduces a simple scenario where an unstably large learning rate scheme leads to a super fast convergence, with the convergence rate depending only logarithmically on the condition number of the problem. Our scheme uses a Cyclical Learning Rate where we periodically take one large unstable step and several small stable steps to compensate for the instability. These findings also help explain the empirical observations of [Smith and Topin, 2019] where they show that CLR with a large maximum learning rate can dramatically accelerate learning and lead to so-called "super-convergence". We prove that our scheme excels in the problems where Hessian exhibits a bimodal spectrum and the eigenvalues can be grouped into two clusters (small and large). The unstably large step is the key to enabling fast convergence over the small eigen-spectrum.

引用

页码：1645 / 1649

页数：5

共 25 条

[1] [Anonymous], 2020, J STAT MECH-THEORY E, DOI DOI 10.1088/1742-5468/abc62b
[2] Belkin M., 2018, Advances in Neural Information Processing Systems, DOI DOI 10.5555/3327144.3327157
[3] Reconciling modern machine-learning practice and the classical bias-variance trade-off
Belkin, Mikhail
Hsu, Daniel
Ma, Siyuan
Mandal, Soumik
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (32) : 15849 - 15854
[4] Cohen J. M., IN PRESS, P2021
[5] Daneshmand H, 2018, PR MACH LEARN RES, V80
[6] Fu H, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P240
[7] Gur-Ari Guy, 2018, ARXIV181204754
[8] Izmailov P, 2018, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, P876
[9] Jacot A, 2018, 32 C NEURAL INFORM P
[10] Karimi Hamed, 2016, JOINT EUROPEAN C MAC, P795

← 1 2 3 →