Spurious Local Minima Are Common for Deep Neural Networks With Piecewise Linear Activations

被引:1
|
作者
Liu, Bo [1 ]
机构
[1] Beijing Univ Technol, Coll Comp Sci, Fac Informat Technol, Beijing 100124, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep learning; Neural networks; Biological neural networks; Neurons; Training; Minimization; Matrix decomposition; Convolutional neural networks (CNNs); deep learning theory; deep neural networks; local minima; loss landscape; MULTISTABILITY;
D O I
10.1109/TNNLS.2022.3204319
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, theoretically, it is shown that spurious local minima are common for deep fully connected networks and average-pooling convolutional neural networks (CNNs) with piecewise linear activations and datasets that cannot be fit by linear models. Motivating examples are given to explain why spurious local minima exist: each output neuron of deep fully connected networks and CNNs with piecewise linear activations produces a continuous piecewise linear (CPWL) function, and different pieces of the CPWL output can optimally fit disjoint groups of data samples when minimizing the empirical risk. Fitting data samples with different CPWL functions usually results in different levels of empirical risk, leading to the prevalence of spurious local minima. The results are proved in general settings with arbitrary continuous loss functions and general piecewise linear activations. The main proof technique is to represent a CPWL function as maximization over minimization of linear pieces. Deep networks with piecewise linear activations are then constructed to produce these linear pieces and implement the maximization over minimization operation.
引用
收藏
页码:5382 / 5394
页数:13
相关论文
共 50 条
  • [31] Scaling Deep Spiking Neural Networks with Binary Stochastic Activations
    Roy, Deboleena
    Chakraborty, Indranil
    Roy, Kaushik
    2019 IEEE INTERNATIONAL CONFERENCE ON COGNITIVE COMPUTING (IEEE ICCC 2019), 2019, : 50 - 58
  • [32] Deep Neural Networks With Trainable Activations and Controlled Lipschitz Constant
    Aziznejad, Shayan
    Gupta, Harshit
    Campos, Joaquim
    Unser, Michael
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 4688 - 4699
  • [33] Gradient Descent Finds Global Minima of Deep Neural Networks
    Du, Simon S.
    Lee, Jason D.
    Li, Haochuan
    Wang, Liwei
    Zhai, Xiyu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [34] Coexistence and local stability of multiple equilibria in neural networks with piecewise linear nondecreasing activation functions
    Wang Lili
    Lu Wenlian
    Chen Tianping
    NEURAL NETWORKS, 2010, 23 (02) : 189 - 200
  • [35] Avoiding local minima in feedforward neural networks by simultaneous learning
    Atakulreka, Akarachai
    Sutivong, Daricha
    AI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4830 : 100 - +
  • [36] Activation Control of Multiple Piecewise Linear Neural Networks
    Hou, Chen
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 4895 - 4907
  • [37] Configuration of continuous piecewise-linear neural networks
    Wang, Shuning
    Huang, Xiaolin
    Junaid, Khan M.
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (08): : 1431 - 1445
  • [38] Memory Capacity of Neural Networks with Threshold and Rectified Linear Unit Activations
    Vershynin, Roman
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2020, 2 (04): : 1004 - 1033
  • [39] NEURO-INSPIRED DEEP NEURAL NETWORKS WITH SPARSE, STRONG ACTIVATIONS
    Cekic, Metehan
    Bakiskan, Can
    Madhow, Upamanyu
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3843 - 3847
  • [40] Geometric Regularization of Local Activations for Knowledge Transfer in Convolutional Neural Networks
    Theodorakopoulos, Ilias
    Fotopoulou, Foteini
    Economou, George
    INFORMATION, 2021, 12 (08)