On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems

被引:2
|
作者
Christof, Constantin [1 ]
Kowalczyk, Julia [1 ]
机构
[1] Tech Univ Munich, Chair Optimal Control, Ctr Math Sci, Boltzmannstr 3, D-85748 Garching, Germany
关键词
Deep artificial neural network; Spurious local minimum; Training problem; Loss landscape; Hadamard well-posedness; Best approximation; Stability analysis; Local affine linearity; APPROXIMATION; LANDSCAPE;
D O I
10.1007/s00365-023-09658-w
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
We study the loss landscape of training problems for deep artificial neural networks with a one-dimensional real output whose activation functions contain an affine segment and whose hidden layers have width at least two. It is shown that such problems possess a continuum of spurious (i.e., not globally optimal) local minima for all target functions that are not affine. In contrast to previous works, our analysis covers all sampling and parameterization regimes, general differentiable loss functions, arbitrary continuous nonpolynomial activation functions, and both the finite-and infinite dimensional setting. It is further shown that the appearance of the spurious local minima in the considered training problems is a direct consequence of the universal approximation theorem and that the underlying mechanisms also cause, e.g., L-P-best approximation problems to be ill-posed in the sense of Hadamard for all networks that do not have a dense image. The latter result also holds without the assumption of local affine linearity and without any conditions on the hidden layers.
引用
收藏
页码:197 / 224
页数:28
相关论文
共 50 条
  • [21] Introduction to Neural Network and Improved Algorithm to Avoid Local Minima and Faster Convergence
    Bari, Ayesha
    Bhasin, Kavleen
    Karnawat, Darshan N.
    COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 396 - +
  • [22] Proving hardness of neural network training problems
    Schmitt, M
    NEURAL NETWORKS, 1997, 10 (08) : 1533 - 1534
  • [23] Solving hard local minima problems using basin cells for multilayer Perceptron training
    Yoon, YG
    Lee, J
    ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 1, PROCEEDINGS, 2005, 3496 : 597 - 602
  • [24] Building and navigating a network of local minima
    Kim, SW
    Boley, D
    JOURNAL OF ROBOTIC SYSTEMS, 2001, 18 (08): : 405 - 419
  • [25] Minimizing model fitting objectives that contain spurious local minima by bootstrap restarting
    Wood, SN
    BIOMETRICS, 2001, 57 (01) : 240 - 244
  • [26] Gradient-only approaches to avoid spurious local minima in unconstrained optimization
    Wilke, Daniel Nicolas
    Kok, Schalk
    Snyman, Johannes Arnoldus
    Groenwold, Albert A.
    OPTIMIZATION AND ENGINEERING, 2013, 14 (02) : 275 - 304
  • [27] XOR has no local minima: A case study in neural network error surface analysis
    Department of Computing, Macquarie University, Sydney, NSW 2109, Australia
    Neural Netw., 4 (669-681):
  • [28] GLOBAL MINIMIZATION FOR PROBLEMS WITH MULTIPLE LOCAL MINIMA
    DENG, YF
    GLIMM, J
    YU, Q
    EISENBERG, M
    APPLIED MATHEMATICS LETTERS, 1993, 6 (02) : 89 - 90
  • [29] XOR has no local minima: A case study in neural network error surface analysis
    Hamey, LGC
    NEURAL NETWORKS, 1998, 11 (04) : 669 - 681
  • [30] LOW-RANK UNIVARIATE SUM OF SQUARES HAS NO SPURIOUS LOCAL MINIMA
    Legat, Benoit
    Yuan, Chenyang
    Parrilo, Pablo
    SIAM JOURNAL ON OPTIMIZATION, 2023, 33 (03) : 2041 - 2061