Spurious Local Minima Are Common for Deep Neural Networks With Piecewise Linear Activations

被引:1
|
作者
Liu, Bo [1 ]
机构
[1] Beijing Univ Technol, Coll Comp Sci, Fac Informat Technol, Beijing 100124, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep learning; Neural networks; Biological neural networks; Neurons; Training; Minimization; Matrix decomposition; Convolutional neural networks (CNNs); deep learning theory; deep neural networks; local minima; loss landscape; MULTISTABILITY;
D O I
10.1109/TNNLS.2022.3204319
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, theoretically, it is shown that spurious local minima are common for deep fully connected networks and average-pooling convolutional neural networks (CNNs) with piecewise linear activations and datasets that cannot be fit by linear models. Motivating examples are given to explain why spurious local minima exist: each output neuron of deep fully connected networks and CNNs with piecewise linear activations produces a continuous piecewise linear (CPWL) function, and different pieces of the CPWL output can optimally fit disjoint groups of data samples when minimizing the empirical risk. Fitting data samples with different CPWL functions usually results in different levels of empirical risk, leading to the prevalence of spurious local minima. The results are proved in general settings with arbitrary continuous loss functions and general piecewise linear activations. The main proof technique is to represent a CPWL function as maximization over minimization of linear pieces. Deep networks with piecewise linear activations are then constructed to produce these linear pieces and implement the maximization over minimization operation.
引用
收藏
页码:5382 / 5394
页数:13
相关论文
共 50 条
  • [1] Spurious Local Minima are Common in Two-Layer ReLU Neural Networks
    Safran, Itay
    Shamir, Ohad
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [2] Multistability of Neural Networks with Piecewise Linear Activations
    Chen, Xiaofeng
    Zhou, Yafei
    Song, Qiankun
    PROCEEDINGS OF THE 7TH CONFERENCE ON BIOLOGICAL DYNAMIC SYSTEM AND STABILITY OF DIFFERENTIAL EQUATION, VOLS I AND II, 2010, : 558 - 561
  • [3] On the Number of Linear Regions of Convolutional Neural Networks With Piecewise Linear Activations
    Xiong, Huan
    Huang, Lei
    Zang, Wenston J. T.
    Zhen, Xiantong
    Xie, Guo-Sen
    Gu, Bin
    Song, Le
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 5131 - 5148
  • [4] Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations
    Ding, Tian
    Li, Dawei
    Sun, Ruoyu
    MATHEMATICS OF OPERATIONS RESEARCH, 2022, 47 (04) : 2784 - 2814
  • [5] Piecewise linear neural networks and deep learning
    Tao, Qinghua
    Li, Li
    Huang, Xiaolin
    Xi, Xiangming
    Wang, Shuning
    Suykens, Johan A. K.
    NATURE REVIEWS METHODS PRIMERS, 2022, 2 (01):
  • [6] Piecewise linear neural networks and deep learning
    Nature Reviews Methods Primers, 2 (1):
  • [7] Piecewise linear neural networks and deep learning
    Qinghua Tao
    Li Li
    Xiaolin Huang
    Xiangming Xi
    Shuning Wang
    Johan A. K. Suykens
    Nature Reviews Methods Primers, 2
  • [8] Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global
    Laurent, Thomas
    von Brecht, James H.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [9] LEARNING IN NEURAL NETWORKS WITH LOCAL MINIMA
    HESKES, TM
    SLIJPEN, ETP
    KAPPEN, B
    PHYSICAL REVIEW A, 1992, 46 (08): : 5221 - 5231
  • [10] On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems
    Christof, Constantin
    Kowalczyk, Julia
    CONSTRUCTIVE APPROXIMATION, 2024, 60 (02) : 197 - 224