Theory II: Deep learning and optimization

被引:4
|
作者
Poggio, T. [1 ]
Liao, Q. [1 ]
机构
[1] MIT, Ctr Brains Minds & Machines, McGovern Inst Brain Res, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
deep learning; convolutional neural networks; loss surface; optimization;
D O I
10.24425/bpas.2018.125925
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The landscape of the empirical risk of overparametrized deep convolutional neural networks (DCNNs) is characterized with a mix of theory and experiments. In part A we show the existence of a large number of global minimizers with zero empirical error (modulo inconsistent equations). The argument which relies on the use of Bezout theorem is rigorous when the RELUs are replaced by a polynomial nonlinearity. We show with simulations that the corresponding polynomial network is indistinguishable from the RELU network. According to Bezout theorem, the global minimizers are degenerate unlike the local minima which in general should be non-degenerate. Further we experimentally analyzed and visualized the landscape of empirical risk of DCNNs on CIFAR-10 dataset. Based on above theoretical and experimental observations, we propose a simple model of the landscape of empirical risk. In part B, we characterize the optimization properties of stochastic gradient descent applied to deep networks. The main claim here consists of theoretical and experimental evidence for the following property of SGD: SGD concentrates in probability - like the classical Langevin equation - on large volume, "flat" minima, selecting with high probability degenerate minimizers which are typically global minimizers.
引用
收藏
页码:775 / 787
页数:13
相关论文
共 50 条
  • [21] Deep curriculum learning optimization
    Ghebrechristos H.
    Alaghband G.
    SN Computer Science, 2020, 1 (5)
  • [22] Uncertainty Injection: A Deep Learning Method for Robust Optimization
    Cui, Wei
    Yu, Wei
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (11) : 7201 - 7213
  • [23] Elephant Sound Classification Using Deep Learning Optimization
    Dewmini, Hiruni
    Meedeniya, Dulani
    Perera, Charith
    SENSORS, 2025, 25 (02)
  • [24] Memristive Stochastic Computing for Deep Learning Parameter Optimization
    Lammie, Corey
    Eshraghian, Jason K.
    Lu, Wei D.
    Azghadi, Mostafa Rahimi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (05) : 1650 - 1654
  • [25] Optimization for Deep Learning: An Overview
    Ruo-Yu Sun
    Journal of the Operations Research Society of China, 2020, 8 : 249 - 294
  • [26] Optimization for Deep Learning: An Overview
    Sun, Ruo-Yu
    JOURNAL OF THE OPERATIONS RESEARCH SOCIETY OF CHINA, 2020, 8 (02) : 249 - 294
  • [27] A deep learning predictive model for selective maintenance optimization
    Hesabi, Hadis
    Nourelfath, Mustapha
    Hajji, Adnene
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2022, 219
  • [28] Applications of Machine Learning and Deep Learning in Antenna Design, Optimization, and Selection: A Review
    Sarker, Nayan
    Podder, Prajoy
    Mondal, M. Rubaiyat Hossain
    Shafin, Sakib Shahriar
    Kamruzzaman, Joarder
    IEEE ACCESS, 2023, 11 : 103890 - 103915
  • [29] A theory-guided deep-learning formulation and optimization of seismic waveform inversion
    Sun J.
    Niu Z.
    Innanen K.A.
    Li J.
    Trad D.O.
    Sun, Jian (sun1@ucalgary.ca), 1600, Society of Exploration Geophysicists (85): : R87 - R99
  • [30] Applying statistical learning theory to deep learning
    Gerbelot, Cedric
    Karagulyan, Avetik
    Karp, Stefani
    Ravichandran, Kavya
    Stern, Menachem
    Srebro, Nathan
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2024, 2024 (10):