Anomalous diffusion dynamics of learning in deep neural networks

被引：12

作者：

Chen, Guozhang ^{[1
]}

Qu, Cheng Kevin ^{[1
]}

Gong, Pulin ^{[1
]}

机构：

[1] Univ Sydney, Sch Phys, Sydney, NSW 2006, Australia

来源：

NEURAL NETWORKS | 2022年 / 149卷

基金：

澳大利亚研究理事会;

关键词：

Deep neural networks; Stochastic gradient descent; Complex systems; ENERGY LANDSCAPE;

D O I：

10.1016/j.neunet.2022.01.019

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning in deep neural networks (DNNs) is implemented through minimizing a highly non-convex loss function, typically by a stochastic gradient descent (SGD) method. This learning process can effectively find generalizable solutions at flat minima. In this study, we present a novel account of how such effective deep learning emerges through the interactions of the SGD and the geometrical structure of the loss landscape. We find that the SGD exhibits rich, complex dynamics when navigating through the loss landscape; initially, the SGD exhibits superdiffusion, which attenuates gradually and changes to subdiffusion at long times when approaching a solution. Such learning dynamics happen ubiquitously in different DNN types such as ResNet, VGG-like networks and Vision Transformers; similar results emerge for various batch size and learning rate settings. The superdiffusion process during the initial learning phase indicates that the motion of SGD along the loss landscape possesses intermittent, big jumps; this non-equilibrium property enables the SGD to effectively explore the loss landscape. By adapting methods developed for studying energy landscapes in complex physical systems, we find that such superdiffusive learning processes are due to the interactions of the SGD and the fractallike regions of the loss landscape. We further develop a phenomenological model to demonstrate the mechanistic role of the fractal-like loss landscape in enabling the SGD to effectively find flat minima. Our results reveal the effectiveness of SGD in deep learning from a novel perspective and have implications for designing efficient deep neural networks.(C) 2022 Elsevier Ltd. All rights reserved.

引用

页码：18 / 28

页数：11

共 50 条

[21] ATTL: An Automated Targeted Transfer Learning with Deep Neural Networks
Ahamed, Sayyed Farid
Aggarwal, Priyanka
Shetty, Sachin
Lanus, Erin
Freeman, Laura J.
2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
[22] A Survey of Sparse-learning Methods for Deep Neural Networks
Ma, Rongrong
Niu, Lingfeng
2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 647 - 650
[23] Learning to Optimize: Training Deep Neural Networks for Interference Management
Sun, Haoran
Chen, Xiangyi
Shi, Qingjiang
Hong, Mingyi
Fu, Xiao
Sidiropoulos, Nicholas D.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (20) : 5438 - 5453
[24] Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks
Qin, Zhen
Tan, Xuwei
Zhu, Zhihui
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 795 - 799
[25] Enhancing deep neural networks via multiple kernel learning
Lauriola, Ivano
Gallicchio, Claudio
Aiolli, Fabio
PATTERN RECOGNITION, 2020, 101
[26] Learning-Rate Annealing Methods for Deep Neural Networks
Nakamura, Kensuke
Derbel, Bilel
Won, Kyoung-Jae
Hong, Byung-Woo
ELECTRONICS, 2021, 10 (16)
[27] A Multiobjective Sparse Feature Learning Model for Deep Neural Networks
Gong, Maoguo
Liu, Jia
Li, Hao
Cai, Qing
Su, Linzhi
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (12) : 3263 - 3277
[28] Ensemble Learning on Deep Neural Networks for Image Caption Generation
Katpally, Harshitha
Bansal, Ajay
2020 IEEE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2020), 2020, : 61 - 68
[29] Could deep learning in neural networks improve the QSAR models?
Gini, G.
Zanoli, F.
Gamba, A.
Raitano, G.
Benfenati, E.
SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2019, 30 (09) : 617 - 642
[30] Neural dynamics for improving optimiser in deep learning with noise considered
Su, Dan
Stanimirovic, Predrag S.
Han, Ling Bo
Jin, Long
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (03) : 722 - 737

← 1 2 3 4 5 →