Deep limits of residual neural networks

被引:8
作者
Thorpe, Matthew [1 ,2 ]
van Gennip, Yves [3 ]
机构
[1] Univ Manchester, Dept Math, Manchester M13 9PL, England
[2] Alan Turing Inst, London NW1 2DB, England
[3] Delft Univ Technol, Delft Inst Appl Math, NL-2628 CD Delft, Netherlands
基金
欧洲研究理事会;
关键词
Deep neural networks; Ordinary differential equations; Deep layer limits; Variational convergence; Gamma-convergence; Regularity;
D O I
10.1007/s40687-022-00370-y
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Neural networks have been very successful in many applications; we often, however, lack a theoretical understanding of what the neural networks are actually learning. This problem emerges when trying to generalise to new data sets. The contribution of this paper is to show that, for the residual neural network model, the deep layer limit coincides with a parameter estimation problem for a nonlinear ordinary differential equation. In particular, whilst it is known that the residual neural network model is a discretisation of an ordinary differential equation, we show convergence in a variational sense. This implies that optimal parameters converge in the deep layer limit. This is a stronger statement than saying for a fixed parameter the residual neural network model converges (the latter does not in general imply the former). Our variational analysis provides a discrete-to-continuum F-convergence result for the objective function of the residual neural network training step to a variational problem constrained by a system of ordinary differential equations; this rigorously connects the discrete setting to a continuum problem.
引用
收藏
页数:44
相关论文
共 97 条
  • [1] Adams RA., 2003, Sobolev Spaces
  • [2] [Anonymous], 2003, Lie algebras, and representations: An elementary introduction
  • [3] [Anonymous], 2009, Proceedings of the Advances in Neural Information Processing Systems, DOI DOI 10.5555/2984093.2984166
  • [4] Anthony M.H., 2001, Discrete mathematics of neural networks: selected topics, V8
  • [5] LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT
    BENGIO, Y
    SIMARD, P
    FRASCONI, P
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02): : 157 - 166
  • [6] Bo L., 2020, arXiv
  • [7] Braides A, 2014, LECT NOTES MATH, V2094, P1, DOI 10.1007/978-3-319-01982-6
  • [8] Braides A, 2002, -Convergence for Beginners. Oxford Lecture Series in Mathematics and its Applications
  • [9] Burden R.L., 2010, Numerical Analysis. Cengage Learning, V10
  • [10] Structure-preserving deep learning
    Celledoni, E.
    Ehrhardt, M. J.
    Etmann, C.
    Mclachlan, R., I
    Owren, B.
    Schonlieb, C-B
    Sherry, F.
    [J]. EUROPEAN JOURNAL OF APPLIED MATHEMATICS, 2021, 32 (05) : 888 - 936