Theory II: Deep learning and optimization

被引：4

作者：

Poggio, T. ^{[1
]}

Liao, Q. ^{[1
]}

机构：

[1] MIT, Ctr Brains Minds & Machines, McGovern Inst Brain Res, Cambridge, MA 02139 USA

来源：

BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES | 2018年 / 66卷 / 06期

基金：

美国国家科学基金会;

关键词：

deep learning; convolutional neural networks; loss surface; optimization;

D O I：

10.24425/bpas.2018.125925

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

The landscape of the empirical risk of overparametrized deep convolutional neural networks (DCNNs) is characterized with a mix of theory and experiments. In part A we show the existence of a large number of global minimizers with zero empirical error (modulo inconsistent equations). The argument which relies on the use of Bezout theorem is rigorous when the RELUs are replaced by a polynomial nonlinearity. We show with simulations that the corresponding polynomial network is indistinguishable from the RELU network. According to Bezout theorem, the global minimizers are degenerate unlike the local minima which in general should be non-degenerate. Further we experimentally analyzed and visualized the landscape of empirical risk of DCNNs on CIFAR-10 dataset. Based on above theoretical and experimental observations, we propose a simple model of the landscape of empirical risk. In part B, we characterize the optimization properties of stochastic gradient descent applied to deep networks. The main claim here consists of theoretical and experimental evidence for the following property of SGD: SGD concentrates in probability - like the classical Langevin equation - on large volume, "flat" minima, selecting with high probability degenerate minimizers which are typically global minimizers.

引用

页码：775 / 787

页数：13

共 50 条

[31] Theory of deep convolutional neural networks II: Spherical analysis
Fang, Zhiying
Feng, Han
Huang, Shuo
Zhou, Ding-Xuan
NEURAL NETWORKS, 2020, 131 : 154 - 162
[32] sqFm: a novel adaptive optimization scheme for deep learning model
Bhakta, Shubhankar
Nandi, Utpal
Mondal, Madhab
Mahapatra, Kuheli Ray
Chowdhuri, Partha
Pal, Pabitra
EVOLUTIONARY INTELLIGENCE, 2024, 17 (01) : 1 - 1
[33] Market Making Strategy Optimization via Deep Reinforcement Learning
Sun, Tianyuan
Huang, Dechun
Yu, Jie
IEEE ACCESS, 2022, 10 : 9085 - 9093
[34] Review on Recent Matrix Multiplication Optimization Using Deep Learning
Mansour, Youssef
Kaissar, Antanios
Ansari, Sam
INTELLIGENT AND FUZZY SYSTEMS, INFUS 2024 CONFERENCE, VOL 1, 2024, 1088 : 359 - 371
[35] Deep Reinforcement Learning for Combinatorial Optimization: Covering Salesman Problems
Li, Kaiwen
Zhang, Tao
Wang, Rui
Wang, Yuheng
Han, Yi
Wang, Ling
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13142 - 13155
[36] Differentially-Private Deep Learning from an Optimization Perspective
Xiang, Liyao
Yang, Jingbo
Li, Baochun
IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 559 - 567
[37] Credit Scoring Using Deep Learning Driven by Optimization Algorithms
Diaconescu, Paul
Neagoe, Victor-Emil
PROCEEDINGS OF THE 2020 12TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI-2020), 2020,
[38] An approach for cancer classification using optimization driven deep learning
Devendran, Menaga
Sathya, Revathi
INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2021, 31 (04) : 1936 - 1953
[39] A Deep Learning for Optimization and Visualization of Expressway Toll Lane Management
Klaykul, Pattarapon
Lee, Wilaiporn
Srisomboon, Kanabadee
Pipanmekaporn, Luepol
Prayote, Akara
IEEE ACCESS, 2025, 13 : 7801 - 7818
[40] Accelerated optimization of curvilinearly stiffened panels using deep learning
Singh, Karanpreet
Kapania, Rakesh K.
THIN-WALLED STRUCTURES, 2021, 161 (161)

← 1 2 3 4 5 →