Implicit Bias of Deep Learning in the Large Learning Rate Phase: A Data Separability Perspective

被引:3
作者
Liu, Chunrui [1 ]
Huang, Wei [2 ]
Xu, Richard Yi Da [3 ]
机构
[1] Univ Technol Sydney, Fac Engn & IT, Sch Comp Sci, Ultimo, NSW 2007, Australia
[2] RIKEN Ctr Adv Intelligence Project AIP, 1-4-1 Nihonbashi,Chuo Ku, Tokyo 1030027, Japan
[3] Hong Kong Baptist Univ, Dept Math, Kowloon Tong, Hong Kong, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 06期
关键词
data separability; data complexity; deep learning theory; catapult phase; neural tangent kernel;
D O I
10.3390/app13063961
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Previous literature on deep learning theory has focused on implicit bias with small learning rates. In this work, we explore the impact of data separability on the implicit bias of deep learning algorithms under the large learning rate. Using deep linear networks for binary classification with the logistic loss under the large learning rate regime, we characterize the implicit bias effect with data separability on training dynamics. From a data analytics perspective, we claim that depending on the separation conditions of data, the gradient descent iterates will converge to a flatter minimum in the large learning rate phase, which results in improved generalization. Our theory is rigorously proven under the assumption of degenerate data by overcoming the difficulty of the non-constant Hessian of logistic loss and confirmed by experiments on both experimental and non-degenerated datasets. Our results highlight the importance of data separability in training dynamics and the benefits of learning rate annealing schemes using an initial large learning rate.
引用
收藏
页数:24
相关论文
共 60 条
[1]  
Ali A, 2020, Arxiv, DOI arXiv:2003.07802
[2]  
Allen-Zhu Z, 2019, PR MACH LEARN RES, V97
[3]  
Allen-Zhu Z, 2019, ADV NEUR IN, V32
[4]  
Arora S, 2019, 33 C NEURAL INFORM P, V32
[5]   Graph Regularized Nonnegative Matrix Factorization for Community Detection in Attributed Networks [J].
Berahmand, Kamal ;
Mohammadi, Mehrnoush ;
Saberi-Movahed, Farid ;
Li, Yuefeng ;
Xu, Yue .
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2023, 10 (01) :372-385
[6]  
Bietti A., 2022, arXiv
[7]  
Bubeck S, 2015, Arxiv, DOI arXiv:1405.4980
[8]  
Chizat L., 2018, Advances in neural information processing systems, P3040
[9]  
Chizat L, 2019, ADV NEUR IN, V32
[10]  
Chizat L, 2020, Arxiv, DOI arXiv:2002.04486