Large deviations of one-hidden-layer neural networks

被引:0
作者
Hirsch, Christian [1 ]
Willhalm, Daniel [2 ,3 ]
机构
[1] Aarhus Univ, Dept Math, Ny Munkegade 118, DK-8000 Aarhus C, Denmark
[2] Univ Groningen, Bernoulli Inst, Nijenborgh 9, NL-9747 AG Groningen, Netherlands
[3] Toronto Metropolitan Univ, Dept Math, 350 Victoria St, Toronto, ON M5B 2K3, Canada
关键词
Artificial neural networks; large deviations; stochastic gradient descent; interacting particle systems; weak convergence;
D O I
10.1142/S0219493725500029
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We study large deviations in the context of stochastic gradient descent for one-hidden-layer neural networks with quadratic loss. We derive a quenched large deviation principle, where we condition on an initial weight measure, and an annealed large deviation principle for the empirical weight evolution during training when letting the number of neurons and the number of training iterations simultaneously tend to infinity. The weight evolution is treated as an interacting dynamic particle system. The distinctive aspect compared to prior work on interacting particle systems lies in the discrete particle updates, simultaneously with a growing number of particles.
引用
收藏
页数:53
相关论文
共 32 条
  • [1] Apollonio N, 2023, Arxiv, DOI arXiv:2307.04486
  • [2] Ara£jo D, 2019, Arxiv, DOI arXiv:1906.00193
  • [3] Billingsley P., 1999, Convergence of Probability Measures, DOI 10.1002/9780470316962
  • [4] Bolley F, 2008, LECT NOTES MATH, V1934, P371
  • [5] Budhiraja A.., 2019, Analysis and Approximation of Rare Events
  • [6] Empirical Measure and Small Noise Asymptotics Under Large Deviation Scaling for Interacting Diffusions
    Budhiraja, Amarjit
    Conroy, Michael
    [J]. JOURNAL OF THEORETICAL PROBABILITY, 2022, 35 (01) : 295 - 349
  • [7] Large Deviations for Brownian Particle Systems with Killing
    Budhiraja, Amarjit
    Fan, Wai-Tong
    Wu, Ruoyu
    [J]. JOURNAL OF THEORETICAL PROBABILITY, 2018, 31 (03) : 1779 - 1818
  • [8] LARGE DEVIATION PROPERTIES OF WEAKLY INTERACTING PROCESSES VIA WEAK CONVERGENCE METHODS
    Budhiraja, Amarjit
    Dupuis, Paul
    Fischer, Markus
    [J]. ANNALS OF PROBABILITY, 2012, 40 (01) : 74 - 102
  • [9] Caron F, 2023, Arxiv, DOI arXiv:2302.01002
  • [10] Chaganty N. R., 1997, Sankhya Ser. A, V59, P147