Theory II: Deep learning and optimization

被引:4
|
作者
Poggio, T. [1 ]
Liao, Q. [1 ]
机构
[1] MIT, Ctr Brains Minds & Machines, McGovern Inst Brain Res, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
deep learning; convolutional neural networks; loss surface; optimization;
D O I
10.24425/bpas.2018.125925
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The landscape of the empirical risk of overparametrized deep convolutional neural networks (DCNNs) is characterized with a mix of theory and experiments. In part A we show the existence of a large number of global minimizers with zero empirical error (modulo inconsistent equations). The argument which relies on the use of Bezout theorem is rigorous when the RELUs are replaced by a polynomial nonlinearity. We show with simulations that the corresponding polynomial network is indistinguishable from the RELU network. According to Bezout theorem, the global minimizers are degenerate unlike the local minima which in general should be non-degenerate. Further we experimentally analyzed and visualized the landscape of empirical risk of DCNNs on CIFAR-10 dataset. Based on above theoretical and experimental observations, we propose a simple model of the landscape of empirical risk. In part B, we characterize the optimization properties of stochastic gradient descent applied to deep networks. The main claim here consists of theoretical and experimental evidence for the following property of SGD: SGD concentrates in probability - like the classical Langevin equation - on large volume, "flat" minima, selecting with high probability degenerate minimizers which are typically global minimizers.
引用
收藏
页码:775 / 787
页数:13
相关论文
共 50 条
  • [31] Theory of deep convolutional neural networks II: Spherical analysis
    Fang, Zhiying
    Feng, Han
    Huang, Shuo
    Zhou, Ding-Xuan
    NEURAL NETWORKS, 2020, 131 : 154 - 162
  • [32] sqFm: a novel adaptive optimization scheme for deep learning model
    Bhakta, Shubhankar
    Nandi, Utpal
    Mondal, Madhab
    Mahapatra, Kuheli Ray
    Chowdhuri, Partha
    Pal, Pabitra
    EVOLUTIONARY INTELLIGENCE, 2024, 17 (01) : 1 - 1
  • [33] Market Making Strategy Optimization via Deep Reinforcement Learning
    Sun, Tianyuan
    Huang, Dechun
    Yu, Jie
    IEEE ACCESS, 2022, 10 : 9085 - 9093
  • [34] Review on Recent Matrix Multiplication Optimization Using Deep Learning
    Mansour, Youssef
    Kaissar, Antanios
    Ansari, Sam
    INTELLIGENT AND FUZZY SYSTEMS, INFUS 2024 CONFERENCE, VOL 1, 2024, 1088 : 359 - 371
  • [35] Deep Reinforcement Learning for Combinatorial Optimization: Covering Salesman Problems
    Li, Kaiwen
    Zhang, Tao
    Wang, Rui
    Wang, Yuheng
    Han, Yi
    Wang, Ling
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13142 - 13155
  • [36] Differentially-Private Deep Learning from an Optimization Perspective
    Xiang, Liyao
    Yang, Jingbo
    Li, Baochun
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 559 - 567
  • [37] Credit Scoring Using Deep Learning Driven by Optimization Algorithms
    Diaconescu, Paul
    Neagoe, Victor-Emil
    PROCEEDINGS OF THE 2020 12TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI-2020), 2020,
  • [38] An approach for cancer classification using optimization driven deep learning
    Devendran, Menaga
    Sathya, Revathi
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2021, 31 (04) : 1936 - 1953
  • [39] A Deep Learning for Optimization and Visualization of Expressway Toll Lane Management
    Klaykul, Pattarapon
    Lee, Wilaiporn
    Srisomboon, Kanabadee
    Pipanmekaporn, Luepol
    Prayote, Akara
    IEEE ACCESS, 2025, 13 : 7801 - 7818
  • [40] Accelerated optimization of curvilinearly stiffened panels using deep learning
    Singh, Karanpreet
    Kapania, Rakesh K.
    THIN-WALLED STRUCTURES, 2021, 161 (161)