Anomalous diffusion dynamics of learning in deep neural networks

被引:12
|
作者
Chen, Guozhang [1 ]
Qu, Cheng Kevin [1 ]
Gong, Pulin [1 ]
机构
[1] Univ Sydney, Sch Phys, Sydney, NSW 2006, Australia
基金
澳大利亚研究理事会;
关键词
Deep neural networks; Stochastic gradient descent; Complex systems; ENERGY LANDSCAPE;
D O I
10.1016/j.neunet.2022.01.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning in deep neural networks (DNNs) is implemented through minimizing a highly non-convex loss function, typically by a stochastic gradient descent (SGD) method. This learning process can effectively find generalizable solutions at flat minima. In this study, we present a novel account of how such effective deep learning emerges through the interactions of the SGD and the geometrical structure of the loss landscape. We find that the SGD exhibits rich, complex dynamics when navigating through the loss landscape; initially, the SGD exhibits superdiffusion, which attenuates gradually and changes to subdiffusion at long times when approaching a solution. Such learning dynamics happen ubiquitously in different DNN types such as ResNet, VGG-like networks and Vision Transformers; similar results emerge for various batch size and learning rate settings. The superdiffusion process during the initial learning phase indicates that the motion of SGD along the loss landscape possesses intermittent, big jumps; this non-equilibrium property enables the SGD to effectively explore the loss landscape. By adapting methods developed for studying energy landscapes in complex physical systems, we find that such superdiffusive learning processes are due to the interactions of the SGD and the fractallike regions of the loss landscape. We further develop a phenomenological model to demonstrate the mechanistic role of the fractal-like loss landscape in enabling the SGD to effectively find flat minima. Our results reveal the effectiveness of SGD in deep learning from a novel perspective and have implications for designing efficient deep neural networks.(C) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页码:18 / 28
页数:11
相关论文
共 50 条
  • [1] Learning Graph Dynamics using Deep Neural Networks
    Narayan, Apurva
    Roe, Peter H. O'N
    IFAC PAPERSONLINE, 2018, 51 (02): : 433 - 438
  • [2] WaveNet-based deep neural networks for the characterization of anomalous diffusion (WADNet)
    Li, Dezhong
    Yao, Qiujin
    Huang, Zihan
    JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL, 2021, 54 (40)
  • [3] Characterizing Learning Dynamics of Deep Neural Networks via Complex Networks
    La Malfa, Emanuele
    La Malfa, Gabriele
    Nicosia, Giuseppe
    Latora, Vito
    2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 344 - 351
  • [4] Reliable deep learning in anomalous diffusion against out-of-distribution dynamics
    Feng, Xiaochen
    Sha, Hao
    Zhang, Yongbing
    Su, Yaoquan
    Liu, Shuai
    Jiang, Yuan
    Hou, Shangguo
    Han, Sanyang
    Ji, Xiangyang
    NATURE COMPUTATIONAL SCIENCE, 2024, 4 (10): : 761 - 772
  • [5] Reliable deep learning in anomalous diffusion against out-of-distribution dynamics
    Feng, Xiaochen
    Sha, Hao
    Zhang, Yongbing
    Su, Yaoquan
    Liu, Shuai
    Jiang, Yuan
    Hou, Shangguo
    Han, Sanyang
    Ji, Xiangyang
    NATURE COMPUTATIONAL SCIENCE, 2024, 4 (11): : 877 - 877
  • [6] Detection of Anomalous Diffusion with Deep Residual Networks
    Gajowczyk, Milosz
    Szwabinski, Janusz
    ENTROPY, 2021, 23 (06)
  • [7] Learning dynamics of gradient descent optimization in deep neural networks
    Wu, Wei
    Jing, Xiaoyuan
    Du, Wencai
    Chen, Guoliang
    SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (05)
  • [8] Learning dynamics of gradient descent optimization in deep neural networks
    Wei WU
    Xiaoyuan JING
    Wencai DU
    Guoliang CHEN
    ScienceChina(InformationSciences), 2021, 64 (05) : 17 - 31
  • [9] Curvature-corrected learning dynamics in deep neural networks
    Huh, Dongsung
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [10] Learning dynamics of gradient descent optimization in deep neural networks
    Wei Wu
    Xiaoyuan Jing
    Wencai Du
    Guoliang Chen
    Science China Information Sciences, 2021, 64