Spurious Local Minima Are Common for Deep Neural Networks With Piecewise Linear Activations

被引：1

作者：

Liu, Bo ^{[1
]}

机构：

[1] Beijing Univ Technol, Coll Comp Sci, Fac Informat Technol, Beijing 100124, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Deep learning; Neural networks; Biological neural networks; Neurons; Training; Minimization; Matrix decomposition; Convolutional neural networks (CNNs); deep learning theory; deep neural networks; local minima; loss landscape; MULTISTABILITY;

D O I：

10.1109/TNNLS.2022.3204319

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this article, theoretically, it is shown that spurious local minima are common for deep fully connected networks and average-pooling convolutional neural networks (CNNs) with piecewise linear activations and datasets that cannot be fit by linear models. Motivating examples are given to explain why spurious local minima exist: each output neuron of deep fully connected networks and CNNs with piecewise linear activations produces a continuous piecewise linear (CPWL) function, and different pieces of the CPWL output can optimally fit disjoint groups of data samples when minimizing the empirical risk. Fitting data samples with different CPWL functions usually results in different levels of empirical risk, leading to the prevalence of spurious local minima. The results are proved in general settings with arbitrary continuous loss functions and general piecewise linear activations. The main proof technique is to represent a CPWL function as maximization over minimization of linear pieces. Deep networks with piecewise linear activations are then constructed to produce these linear pieces and implement the maximization over minimization operation.

引用

页码：5382 / 5394

页数：13

共 50 条

[1] Spurious Local Minima are Common in Two-Layer ReLU Neural Networks
Safran, Itay
Shamir, Ohad
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[2] Multistability of Neural Networks with Piecewise Linear Activations
Chen, Xiaofeng
Zhou, Yafei
Song, Qiankun
PROCEEDINGS OF THE 7TH CONFERENCE ON BIOLOGICAL DYNAMIC SYSTEM AND STABILITY OF DIFFERENTIAL EQUATION, VOLS I AND II, 2010, : 558 - 561
[3] On the Number of Linear Regions of Convolutional Neural Networks With Piecewise Linear Activations
Xiong, Huan
Huang, Lei
Zang, Wenston J. T.
Zhen, Xiantong
Xie, Guo-Sen
Gu, Bin
Song, Le
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 5131 - 5148
[4] Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations
Ding, Tian
Li, Dawei
Sun, Ruoyu
MATHEMATICS OF OPERATIONS RESEARCH, 2022, 47 (04) : 2784 - 2814
[5] Piecewise linear neural networks and deep learning
Tao, Qinghua
Li, Li
Huang, Xiaolin
Xi, Xiangming
Wang, Shuning
Suykens, Johan A. K.
NATURE REVIEWS METHODS PRIMERS, 2022, 2 (01):
[6] Piecewise linear neural networks and deep learning
Nature Reviews Methods Primers, 2 (1):
[7] Piecewise linear neural networks and deep learning
Qinghua Tao
Li Li
Xiaolin Huang
Xiangming Xi
Shuning Wang
Johan A. K. Suykens
Nature Reviews Methods Primers, 2
[8] Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global
Laurent, Thomas
von Brecht, James H.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[9] LEARNING IN NEURAL NETWORKS WITH LOCAL MINIMA
HESKES, TM
SLIJPEN, ETP
KAPPEN, B
PHYSICAL REVIEW A, 1992, 46 (08): : 5221 - 5231
[10] On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems
Christof, Constantin
Kowalczyk, Julia
CONSTRUCTIVE APPROXIMATION, 2024, 60 (02) : 197 - 224

← 1 2 3 4 5 →