Anomalous diffusion dynamics of learning in deep neural networks

被引:12
作者
Chen, Guozhang [1 ]
Qu, Cheng Kevin [1 ]
Gong, Pulin [1 ]
机构
[1] Univ Sydney, Sch Phys, Sydney, NSW 2006, Australia
基金
澳大利亚研究理事会;
关键词
Deep neural networks; Stochastic gradient descent; Complex systems; ENERGY LANDSCAPE;
D O I
10.1016/j.neunet.2022.01.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning in deep neural networks (DNNs) is implemented through minimizing a highly non-convex loss function, typically by a stochastic gradient descent (SGD) method. This learning process can effectively find generalizable solutions at flat minima. In this study, we present a novel account of how such effective deep learning emerges through the interactions of the SGD and the geometrical structure of the loss landscape. We find that the SGD exhibits rich, complex dynamics when navigating through the loss landscape; initially, the SGD exhibits superdiffusion, which attenuates gradually and changes to subdiffusion at long times when approaching a solution. Such learning dynamics happen ubiquitously in different DNN types such as ResNet, VGG-like networks and Vision Transformers; similar results emerge for various batch size and learning rate settings. The superdiffusion process during the initial learning phase indicates that the motion of SGD along the loss landscape possesses intermittent, big jumps; this non-equilibrium property enables the SGD to effectively explore the loss landscape. By adapting methods developed for studying energy landscapes in complex physical systems, we find that such superdiffusive learning processes are due to the interactions of the SGD and the fractallike regions of the loss landscape. We further develop a phenomenological model to demonstrate the mechanistic role of the fractal-like loss landscape in enabling the SGD to effectively find flat minima. Our results reveal the effectiveness of SGD in deep learning from a novel perspective and have implications for designing efficient deep neural networks.(C) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页码:18 / 28
页数:11
相关论文
共 50 条
  • [21] ATTL: An Automated Targeted Transfer Learning with Deep Neural Networks
    Ahamed, Sayyed Farid
    Aggarwal, Priyanka
    Shetty, Sachin
    Lanus, Erin
    Freeman, Laura J.
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [22] A Survey of Sparse-learning Methods for Deep Neural Networks
    Ma, Rongrong
    Niu, Lingfeng
    2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 647 - 650
  • [23] Learning to Optimize: Training Deep Neural Networks for Interference Management
    Sun, Haoran
    Chen, Xiangyi
    Shi, Qingjiang
    Hong, Mingyi
    Fu, Xiao
    Sidiropoulos, Nicholas D.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (20) : 5438 - 5453
  • [24] Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks
    Qin, Zhen
    Tan, Xuwei
    Zhu, Zhihui
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 795 - 799
  • [25] Enhancing deep neural networks via multiple kernel learning
    Lauriola, Ivano
    Gallicchio, Claudio
    Aiolli, Fabio
    PATTERN RECOGNITION, 2020, 101
  • [26] Learning-Rate Annealing Methods for Deep Neural Networks
    Nakamura, Kensuke
    Derbel, Bilel
    Won, Kyoung-Jae
    Hong, Byung-Woo
    ELECTRONICS, 2021, 10 (16)
  • [27] A Multiobjective Sparse Feature Learning Model for Deep Neural Networks
    Gong, Maoguo
    Liu, Jia
    Li, Hao
    Cai, Qing
    Su, Linzhi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (12) : 3263 - 3277
  • [28] Ensemble Learning on Deep Neural Networks for Image Caption Generation
    Katpally, Harshitha
    Bansal, Ajay
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2020), 2020, : 61 - 68
  • [29] Could deep learning in neural networks improve the QSAR models?
    Gini, G.
    Zanoli, F.
    Gamba, A.
    Raitano, G.
    Benfenati, E.
    SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2019, 30 (09) : 617 - 642
  • [30] Neural dynamics for improving optimiser in deep learning with noise considered
    Su, Dan
    Stanimirovic, Predrag S.
    Han, Ling Bo
    Jin, Long
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (03) : 722 - 737