Law of Large Numbers and Central Limit Theorem for Wide Two-layer Neural Networks: The Mini-Batch and Noisy Case

被引：0

作者：

Descours, Arnaud ^{[1
]}

Guillin, Arnaud ^{[1
,2
]}

Michel, Manon ^{[3
]}

Nectoux, Boris ^{[1
]}

机构：

[1] Univ Clermont Auvergne, Lab Math Blaise Pascal UMR 6620, Aubiere, France

[2] Univ Clermont Auvergne, Inst Univ France, Aubiere, France

[3] Univ Clermont Auvergne Aubiere, Polit Sci, Lab Math Blaise Pascal UMR 6620, Paris, France

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2024年 / 25卷

关键词：

Machine learning; Neural networks; Law of large numbers; central limit theorem; Empirical measures; Particle systems; Mean field; FLUCTUATIONS; PROPAGATION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work, we consider a wide two-layer neural network and study the behavior of its empirical weights under a dynamics set by a stochastic gradient descent along the quadratic loss with mini-batches and noise. Our goal is to prove a trajectorial law of large number as well as a central limit theorem for their evolution. When the noise is scaling as 1/N-beta and 1/2 < beta <= infinity, we rigorously derive and generalize the LLN obtained for example in [CRBVE20, MMM19, SS20b]. When 3/4 < beta <= infinity, we also generalize the CLT (see also [SS20a]) and further exhibit the effect of mini-batching on the asymptotic variance which leads the fluctuations. The case beta = 3/4 is trickier and we give an example showing the divergence with time of the variance thus establishing the instability of the predictions of the neural network in this case. It is illustrated by simple numerical examples.

引用

页数：76

共 35 条

[1]

Adams R A., 2003, Sobolev Spaces

[2]

Billingsley Patrick, 1999, CONVERGE PROBAB MEAS, DOI DOI 10.1002/9780470316962

[3]

Chen Zhengdao, 2020, ADV NEUR IN, V33

[4]

Chizat L, 2018, ADV NEUR IN, V31

[5]

Chizat L, 2020, PR MACH LEARN RES, V125

[6]

De Bortoli V, 2020, ADV NEUR IN, V33

[7] From the master equation to mean field game limit theory: a central limit theorem [J].

Delarue, Francois ;

Lacker, Daniel ;

Ramanan, Kavita .

ELECTRONIC JOURNAL OF PROBABILITY, 2019, 24

[8]

Ethier SN., 2009, Markov Processes, Characterization and Convergence, Vvol. 282

[9] A Hilbertian approach for fluctuations on the McKean-Vlasov model [J].

Fernandez, B ;

Meleard, S .

STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 1997, 71 (01) :33-53

[10]

Gower RM, 2019, PR MACH LEARN RES, V97

← 1 2 3 4 →