Spurious Local Minima Are Common for Deep Neural Networks With Piecewise Linear Activations

被引：1

作者：

Liu, Bo ^{[1
]}

机构：

[1] Beijing Univ Technol, Coll Comp Sci, Fac Informat Technol, Beijing 100124, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Deep learning; Neural networks; Biological neural networks; Neurons; Training; Minimization; Matrix decomposition; Convolutional neural networks (CNNs); deep learning theory; deep neural networks; local minima; loss landscape; MULTISTABILITY;

D O I：

10.1109/TNNLS.2022.3204319

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this article, theoretically, it is shown that spurious local minima are common for deep fully connected networks and average-pooling convolutional neural networks (CNNs) with piecewise linear activations and datasets that cannot be fit by linear models. Motivating examples are given to explain why spurious local minima exist: each output neuron of deep fully connected networks and CNNs with piecewise linear activations produces a continuous piecewise linear (CPWL) function, and different pieces of the CPWL output can optimally fit disjoint groups of data samples when minimizing the empirical risk. Fitting data samples with different CPWL functions usually results in different levels of empirical risk, leading to the prevalence of spurious local minima. The results are proved in general settings with arbitrary continuous loss functions and general piecewise linear activations. The main proof technique is to represent a CPWL function as maximization over minimization of linear pieces. Deep networks with piecewise linear activations are then constructed to produce these linear pieces and implement the maximization over minimization operation.

引用

页码：5382 / 5394

页数：13

共 50 条

[31] Scaling Deep Spiking Neural Networks with Binary Stochastic Activations
Roy, Deboleena
Chakraborty, Indranil
Roy, Kaushik
2019 IEEE INTERNATIONAL CONFERENCE ON COGNITIVE COMPUTING (IEEE ICCC 2019), 2019, : 50 - 58
[32] Deep Neural Networks With Trainable Activations and Controlled Lipschitz Constant
Aziznejad, Shayan
Gupta, Harshit
Campos, Joaquim
Unser, Michael
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 4688 - 4699
[33] Gradient Descent Finds Global Minima of Deep Neural Networks
Du, Simon S.
Lee, Jason D.
Li, Haochuan
Wang, Liwei
Zhai, Xiyu
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[34] Coexistence and local stability of multiple equilibria in neural networks with piecewise linear nondecreasing activation functions
Wang Lili
Lu Wenlian
Chen Tianping
NEURAL NETWORKS, 2010, 23 (02) : 189 - 200
[35] Avoiding local minima in feedforward neural networks by simultaneous learning
Atakulreka, Akarachai
Sutivong, Daricha
AI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4830 : 100 - +
[36] Activation Control of Multiple Piecewise Linear Neural Networks
Hou, Chen
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 4895 - 4907
[37] Configuration of continuous piecewise-linear neural networks
Wang, Shuning
Huang, Xiaolin
Junaid, Khan M.
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (08): : 1431 - 1445
[38] Memory Capacity of Neural Networks with Threshold and Rectified Linear Unit Activations
Vershynin, Roman
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2020, 2 (04): : 1004 - 1033
[39] NEURO-INSPIRED DEEP NEURAL NETWORKS WITH SPARSE, STRONG ACTIVATIONS
Cekic, Metehan
Bakiskan, Can
Madhow, Upamanyu
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3843 - 3847
[40] Geometric Regularization of Local Activations for Knowledge Transfer in Convolutional Neural Networks
Theodorakopoulos, Ilias
Fotopoulou, Foteini
Economou, George
INFORMATION, 2021, 12 (08)

← 1 2 3 4 5 →