Theory II: Deep learning and optimization

被引:4
|
作者
Poggio, T. [1 ]
Liao, Q. [1 ]
机构
[1] MIT, Ctr Brains Minds & Machines, McGovern Inst Brain Res, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
deep learning; convolutional neural networks; loss surface; optimization;
D O I
10.24425/bpas.2018.125925
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The landscape of the empirical risk of overparametrized deep convolutional neural networks (DCNNs) is characterized with a mix of theory and experiments. In part A we show the existence of a large number of global minimizers with zero empirical error (modulo inconsistent equations). The argument which relies on the use of Bezout theorem is rigorous when the RELUs are replaced by a polynomial nonlinearity. We show with simulations that the corresponding polynomial network is indistinguishable from the RELU network. According to Bezout theorem, the global minimizers are degenerate unlike the local minima which in general should be non-degenerate. Further we experimentally analyzed and visualized the landscape of empirical risk of DCNNs on CIFAR-10 dataset. Based on above theoretical and experimental observations, we propose a simple model of the landscape of empirical risk. In part B, we characterize the optimization properties of stochastic gradient descent applied to deep networks. The main claim here consists of theoretical and experimental evidence for the following property of SGD: SGD concentrates in probability - like the classical Langevin equation - on large volume, "flat" minima, selecting with high probability degenerate minimizers which are typically global minimizers.
引用
收藏
页码:775 / 787
页数:13
相关论文
共 50 条
  • [41] LibAUC: A Deep Learning Library for X-Risk Optimization
    Yuan, Zhuoning
    Zhu, Dixian
    Qiu, Zi-Hao
    Li, Gang
    Wang, Xuanhui
    Yang, Tianbao
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5487 - 5499
  • [42] Optimization of Apparel Supply Chain Using Deep Reinforcement Learning
    Chong, Ji Won
    Kim, Wooju
    Hong, Jun Seok
    IEEE ACCESS, 2022, 10 : 100367 - 100375
  • [43] Tidal turbine hydrofoil design and optimization based on deep learning
    Li, Changming
    Liu, Bin
    Wang, Shujie
    Yuan, Peng
    Lang, Xianpeng
    Tan, Junzhe
    Si, Xiancai
    RENEWABLE ENERGY, 2024, 226
  • [44] Probabilistic Constrained Optimization for Predictive Video Streaming by Deep Learning
    Yin, Manru
    Sun, Chengjian
    Yang, Chenyang
    Han, Shengqian
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2023, 71 (02) : 823 - 836
  • [45] Reinforcement learning for deep portfolio optimization
    Yan, Ruyu
    Jin, Jiafei
    Han, Kun
    ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (09): : 5176 - 5200
  • [46] Practical Deep Learning Architecture Optimization
    Wistuba, Martin
    2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, : 263 - 272
  • [47] Deep Reinforcement Learning for Multiobjective Optimization
    Li, Kaiwen
    Zhang, Tao
    Wang, Rui
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (06) : 3103 - 3114
  • [48] Estimation theory and Neural Networks revisited: REKF and RSVSF as optimization techniques for Deep-Learning
    Ismail, Mahmoud
    Attari, Mina
    Habibi, Saeid
    Ziada, Samir
    NEURAL NETWORKS, 2018, 108 : 509 - 526
  • [49] Deep Item Response Theory as a Novel Test Theory Based on Deep Learning
    Tsutsumi, Emiko
    Kinoshita, Ryo
    Ueno, Maomi
    ELECTRONICS, 2021, 10 (09)
  • [50] Deep learning and punctuated equilibrium theory
    Hegelich, Simon
    COGNITIVE SYSTEMS RESEARCH, 2017, 45 : 59 - 69