Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

被引:0
作者
Boursier, Etienne [1 ]
Pillaud-Vivien, Loucas [1 ]
Flammarion, Nicolas [1 ]
机构
[1] Ecole Polytech Fed Lausanne, TML, Lausanne, Switzerland
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年
关键词
NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The training of neural networks by gradient descent methods is a cornerstone of the deep learning revolution. Yet, despite some recent progress, a complete theory explaining its success is still missing. This article presents, for orthogonal input vectors, a precise description of the gradient flow dynamics of training one-hidden layer ReLU neural networks for the mean squared error at small initialisation. In this setting, despite non-convexity, we show that the gradient flow converges to zero loss and characterise its implicit bias towards minimum variation norm. Furthermore, some interesting phenomena are highlighted: a quantitative description of the initial alignment phenomenon and a proof that the process follows a specific saddle to saddle dynamics.
引用
收藏
页数:14
相关论文
共 51 条
  • [1] Abbe E, 2022, PR MACH LEARN RES, P33
  • [2] Allen-Zhu Z, 2019, ADV NEUR IN, V32
  • [3] [Anonymous], 2019, ADV NEUR IN
  • [4] Arora S, 2019, PR MACH LEARN RES, V97
  • [5] Bach Francis, 2017, J MACH LEARN RES, V18
  • [6] BERTOIN D, 2021, ADV NEUR IN, V34, pNI619
  • [7] Bietti A, 2019, ADV NEUR IN, V32
  • [8] THE LOJASIEWICZ INEQUALITY FOR NONSMOOTH SUBANALYTIC FUNCTIONS WITH APPLICATIONS TO SUBGRADIENT DYNAMICAL SYSTEMS
    Bolte, Jerome
    Daniilidis, Aris
    Lewis, Adrian
    [J]. SIAM JOURNAL ON OPTIMIZATION, 2007, 17 (04) : 1205 - 1223
  • [9] CHARACTERIZATIONS OF LOJASIEWICZ INEQUALITIES: SUBGRADIENT FLOWS, TALWEG, CONVEXITY
    Bolte, Jerome
    Daniilidis, Aris
    Ley, Olivier
    Mazet, Laurent
    [J]. TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY, 2010, 362 (06) : 3319 - 3363
  • [10] Chatterjee Sourav, 2022, ARXIV220316462