Theory II: Deep learning and optimization

被引:4
|
作者
Poggio, T. [1 ]
Liao, Q. [1 ]
机构
[1] MIT, Ctr Brains Minds & Machines, McGovern Inst Brain Res, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
deep learning; convolutional neural networks; loss surface; optimization;
D O I
10.24425/bpas.2018.125925
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The landscape of the empirical risk of overparametrized deep convolutional neural networks (DCNNs) is characterized with a mix of theory and experiments. In part A we show the existence of a large number of global minimizers with zero empirical error (modulo inconsistent equations). The argument which relies on the use of Bezout theorem is rigorous when the RELUs are replaced by a polynomial nonlinearity. We show with simulations that the corresponding polynomial network is indistinguishable from the RELU network. According to Bezout theorem, the global minimizers are degenerate unlike the local minima which in general should be non-degenerate. Further we experimentally analyzed and visualized the landscape of empirical risk of DCNNs on CIFAR-10 dataset. Based on above theoretical and experimental observations, we propose a simple model of the landscape of empirical risk. In part B, we characterize the optimization properties of stochastic gradient descent applied to deep networks. The main claim here consists of theoretical and experimental evidence for the following property of SGD: SGD concentrates in probability - like the classical Langevin equation - on large volume, "flat" minima, selecting with high probability degenerate minimizers which are typically global minimizers.
引用
收藏
页码:775 / 787
页数:13
相关论文
共 50 条
  • [1] Model-Based Deep Learning: On the Intersection of Deep Learning and Optimization
    Shlezinger, Nir
    Eldar, Yonina C.
    Boyd, Stephen P.
    IEEE ACCESS, 2022, 10 : 115384 - 115398
  • [2] Lighting Spectrum Optimization With Deep Learning for Moss Species Classification
    Ito, Kenichi
    Falt, Pauli
    Hauta-Kasari, Markku
    Nakauchi, Shigeki
    IEEE ACCESS, 2025, 13 : 18749 - 18759
  • [3] Lightweight Deep Learning Model Optimization for Medical Image Analysis
    Al-Milaji, Zahraa
    Yousif, Hayder
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (05)
  • [4] A Survey on Hyperparameters Optimization of Deep Learning for Time Series Classification
    Fristiana, Ayuningtyas Hari
    Alfarozi, Syukron Abu Ishaq
    Permanasari, Adhistya Erna
    Pratama, Mahardhika
    Wibirama, Sunu
    IEEE ACCESS, 2024, 12 : 191162 - 191198
  • [5] Data Optimization in Deep Learning: A Survey
    Wu, Ou
    Yao, Rujing
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (05) : 2356 - 2375
  • [6] A Comparison of Optimization Algorithms for Deep Learning
    Soydaner, Derya
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (13)
  • [7] Deep learning for computational structural optimization
    Nguyen, Long C.
    Nguyen-Xuan, H.
    ISA TRANSACTIONS, 2020, 103 : 177 - 191
  • [8] CHAOS THEORY, ADVANCED METAHEURISTIC ALGORITHMS AND THEIR NEWFANGLED DEEP LEARNING ARCHITECTURE OPTIMIZATION APPLICATIONS: A REVIEW
    Akgul, Akif
    Karaca, Yell'z
    Pala, Muhammed Ali
    Cimen, Murat Erhan
    Boz, Ali Fuat
    Yildiz, Mustafa Zahid
    FRACTALS-COMPLEX GEOMETRY PATTERNS AND SCALING IN NATURE AND SOCIETY, 2024, 32 (03)
  • [9] Optimization of Sensors for Structure Damage Detection Using Deep Learning Approach
    Kuo, Chine-Chih
    Lee, Ching-Hung
    IEEE SENSORS JOURNAL, 2023, 23 (21) : 26401 - 26410
  • [10] RESEARCH ON OPTIMIZATION OF VISUAL OBJECT TRACKING ALGORITHM BASED ON DEEP LEARNING
    Liu, Xiaolong
    Rodelas, Nelson C.
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (06): : 5603 - 5613