Best k-Layer Neural Network Approximations

被引:1
|
作者
Lim, Lek-Heng [1 ]
Michalek, Mateusz [2 ,3 ]
Qi, Yang [4 ]
机构
[1] Univ Chicago, Dept Stat, Chicago, IL 60637 USA
[2] Max Planck Inst Math Sci, D-04103 Leipzig, Germany
[3] Univ Konstanz, D-78457 Constance, Germany
[4] Ecole Polytech, INRIA Saclay Ile France, CMAP, IP Paris,CNRS, F-91128 Palaiseau, France
关键词
Neural network; Best approximation; Join loci; Secant loci;
D O I
10.1007/s00365-021-09545-2
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
We show that the empirical risk minimization (ERM) problem for neural networks has no solution in general. Given a training set s(1), ..., s(n) is an element of R-p with corresponding responses t(1), ..., t(n) is an element of R-q, fitting a k-layer neural network v(theta) : R-p -> R-q involves estimation of the weights theta is an element of R-m via an ERM: inf(theta is an element of Rm)Sigma(n)(i=1)parallel to t(i) - v(theta)(s(i))parallel to(2)(2). We show that even for k = 2, this infimum is not attainable in general for common activations like ReLU, hyperbolic tangent, and sigmoid functions. In addition, we deduce that if one attempts to minimize such a loss function in the event when its infimum is not attainable, it necessarily results in values of theta diverging to +/-infinity. We will show that for smooth activations sigma(x) = 1/(1 + exp(-x)) and sigma(x) = tanh(x), such failure to attain an infimum can happen on a positive-measured subset of responses. For the ReLU activation sigma(x) = max(0, x), we completely classify cases where the ERM for a best two-layer neural network approximation attains its infimum. In recent applications of neural networks, where overfitting is commonplace, the failure to attain an infimum is avoided by ensuring that the system of equations t(i) = v(theta)(s(i)), i = 1, ..., n, has a solution. For a two-layer ReLU-activated network, we will show when such a system of equations has a solution generically, i.e., when can such a neural network be fitted perfectly with probability one.
引用
收藏
页码:583 / 604
页数:22
相关论文
共 50 条
  • [31] Propagating interfaces in a two-layer bistable neural network
    Kazantsev, V. B.
    Nekorkin, V. I.
    Morfu, S.
    Bilbault, J. M.
    Marquié, P.
    INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 2006, 16 (03): : 589 - 600
  • [32] Computational Complexity of Neural Network Linear Layer Inference Optimizations
    Pendl, Klaus
    Rudic, Branislav
    4TH INTERDISCIPLINARY CONFERENCE ON ELECTRICS AND COMPUTER, INTCEC 2024, 2024,
  • [33] Image Zooming Using a Multi-layer Neural Network
    Hassanpour, H.
    Nowrozian, N.
    AlyanNezhadi, M. M.
    Samadiani, N.
    COMPUTER JOURNAL, 2018, 61 (11): : 1737 - 1748
  • [34] EXPERIMENTAL DEMONSTRATION OF OPTICAL 3-LAYER NEURAL NETWORK
    KASAMA, N
    HAYASAKI, Y
    YATAGAI, T
    MORI, M
    ISHIHARA, S
    JAPANESE JOURNAL OF APPLIED PHYSICS PART 2-LETTERS & EXPRESS LETTERS, 1990, 29 (08): : L1565 - L1568
  • [35] Application of four-layer neural network on information extraction
    Han, M
    Cheng, L
    Meng, H
    NEURAL NETWORKS, 2003, 16 (5-6) : 547 - 553
  • [36] The scale-invariant space for attention layer in neural network
    Wang Y.
    Liu Y.
    Ma Z.-M.
    Neurocomputing, 2020, 392 : 1 - 10
  • [37] A Study on Single and Multi-layer Perceptron Neural Network
    Singh, Jaswinder
    Banerjee, Rajdeep
    PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 35 - 40
  • [38] Calculation of the reflectance and transmittance of a disperse layer by the neural network method
    V. V. Berdnik
    G. I. Gallyamova
    Optics and Spectroscopy, 2012, 112 : 618 - 623
  • [39] Multi-Layer Fusion Neural Network for Deepfake Detection
    Zhao, Zheng
    Wang, Penghui
    Lu, Wei
    INTERNATIONAL JOURNAL OF DIGITAL CRIME AND FORENSICS, 2021, 13 (04) : 26 - 39
  • [40] Multi-layer neural network with deep belief network for gearbox fault diagnosis
    Chen, Zhiqiang
    Li, Chuan
    Sanchez, Rene-Vinicio
    JOURNAL OF VIBROENGINEERING, 2015, 17 (05) : 2379 - 2392