On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems

被引：2

作者：

Christof, Constantin ^{[1
]}

Kowalczyk, Julia ^{[1
]}

机构：

[1] Tech Univ Munich, Chair Optimal Control, Ctr Math Sci, Boltzmannstr 3, D-85748 Garching, Germany

来源：

CONSTRUCTIVE APPROXIMATION | 2024年 / 60卷 / 02期

关键词：

Deep artificial neural network; Spurious local minimum; Training problem; Loss landscape; Hadamard well-posedness; Best approximation; Stability analysis; Local affine linearity; APPROXIMATION; LANDSCAPE;

D O I：

10.1007/s00365-023-09658-w

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

We study the loss landscape of training problems for deep artificial neural networks with a one-dimensional real output whose activation functions contain an affine segment and whose hidden layers have width at least two. It is shown that such problems possess a continuum of spurious (i.e., not globally optimal) local minima for all target functions that are not affine. In contrast to previous works, our analysis covers all sampling and parameterization regimes, general differentiable loss functions, arbitrary continuous nonpolynomial activation functions, and both the finite-and infinite dimensional setting. It is further shown that the appearance of the spurious local minima in the considered training problems is a direct consequence of the universal approximation theorem and that the underlying mechanisms also cause, e.g., L-P-best approximation problems to be ill-posed in the sense of Hadamard for all networks that do not have a dense image. The latter result also holds without the assumption of local affine linearity and without any conditions on the hidden layers.

引用

页码：197 / 224

页数：28

共 50 条

[21] Introduction to Neural Network and Improved Algorithm to Avoid Local Minima and Faster Convergence
Bari, Ayesha
Bhasin, Kavleen
Karnawat, Darshan N.
COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 396 - +
[22] Proving hardness of neural network training problems
Schmitt, M
NEURAL NETWORKS, 1997, 10 (08) : 1533 - 1534
[23] Solving hard local minima problems using basin cells for multilayer Perceptron training
Yoon, YG
Lee, J
ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 1, PROCEEDINGS, 2005, 3496 : 597 - 602
[24] Building and navigating a network of local minima
Kim, SW
Boley, D
JOURNAL OF ROBOTIC SYSTEMS, 2001, 18 (08): : 405 - 419
[25] Minimizing model fitting objectives that contain spurious local minima by bootstrap restarting
Wood, SN
BIOMETRICS, 2001, 57 (01) : 240 - 244
[26] Gradient-only approaches to avoid spurious local minima in unconstrained optimization
Wilke, Daniel Nicolas
Kok, Schalk
Snyman, Johannes Arnoldus
Groenwold, Albert A.
OPTIMIZATION AND ENGINEERING, 2013, 14 (02) : 275 - 304
[27] XOR has no local minima: A case study in neural network error surface analysis
Department of Computing, Macquarie University, Sydney, NSW 2109, Australia
Neural Netw., 4 (669-681):
[28] GLOBAL MINIMIZATION FOR PROBLEMS WITH MULTIPLE LOCAL MINIMA
DENG, YF
GLIMM, J
YU, Q
EISENBERG, M
APPLIED MATHEMATICS LETTERS, 1993, 6 (02) : 89 - 90
[29] XOR has no local minima: A case study in neural network error surface analysis
Hamey, LGC
NEURAL NETWORKS, 1998, 11 (04) : 669 - 681
[30] LOW-RANK UNIVARIATE SUM OF SQUARES HAS NO SPURIOUS LOCAL MINIMA
Legat, Benoit
Yuan, Chenyang
Parrilo, Pablo
SIAM JOURNAL ON OPTIMIZATION, 2023, 33 (03) : 2041 - 2061

← 1 2 3 4 5 →