Anomalous diffusion dynamics of learning in deep neural networks

被引:12
|
作者
Chen, Guozhang [1 ]
Qu, Cheng Kevin [1 ]
Gong, Pulin [1 ]
机构
[1] Univ Sydney, Sch Phys, Sydney, NSW 2006, Australia
基金
澳大利亚研究理事会;
关键词
Deep neural networks; Stochastic gradient descent; Complex systems; ENERGY LANDSCAPE;
D O I
10.1016/j.neunet.2022.01.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning in deep neural networks (DNNs) is implemented through minimizing a highly non-convex loss function, typically by a stochastic gradient descent (SGD) method. This learning process can effectively find generalizable solutions at flat minima. In this study, we present a novel account of how such effective deep learning emerges through the interactions of the SGD and the geometrical structure of the loss landscape. We find that the SGD exhibits rich, complex dynamics when navigating through the loss landscape; initially, the SGD exhibits superdiffusion, which attenuates gradually and changes to subdiffusion at long times when approaching a solution. Such learning dynamics happen ubiquitously in different DNN types such as ResNet, VGG-like networks and Vision Transformers; similar results emerge for various batch size and learning rate settings. The superdiffusion process during the initial learning phase indicates that the motion of SGD along the loss landscape possesses intermittent, big jumps; this non-equilibrium property enables the SGD to effectively explore the loss landscape. By adapting methods developed for studying energy landscapes in complex physical systems, we find that such superdiffusive learning processes are due to the interactions of the SGD and the fractallike regions of the loss landscape. We further develop a phenomenological model to demonstrate the mechanistic role of the fractal-like loss landscape in enabling the SGD to effectively find flat minima. Our results reveal the effectiveness of SGD in deep learning from a novel perspective and have implications for designing efficient deep neural networks.(C) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页码:18 / 28
页数:11
相关论文
共 50 条
  • [21] Selection dynamics for deep neural networks
    Liu, Hailiang
    Markowich, Peter
    JOURNAL OF DIFFERENTIAL EQUATIONS, 2020, 269 (12) : 11540 - 11574
  • [22] Deep neural networks and molecular dynamics
    Car, Roberto
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 256
  • [23] Online Deep Learning: Learning Deep Neural Networks on the Fly
    Sahoo, Doyen
    Pham, Quang
    Lu, Jing
    Hoi, Steven C. H.
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2660 - 2666
  • [24] Infinitely deep neural networks as diffusion processes
    Peluchetti, Stefano
    Favaro, Stefano
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1126 - 1135
  • [25] LEARNING THE DYNAMICS FOR UNKNOWN HYPERBOLIC CONSERVATION LAWS USING DEEP NEURAL NETWORKS
    Chen, Zhen
    Gelb, Anne
    Lee, Yoonsang
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2024, 46 (02): : A825 - A850
  • [26] Dynamics of Deep Neural Networks and Neural Tangent Hierarchy
    Huang, Jiaoyang
    Yau, Horng-Tzer
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [27] How Can Anomalous-Diffusion Neural Networks Under Connectomics Generate Optimized Spatiotemporal Dynamics
    He, Jiajin
    Xiao, Min
    Yu, Wenwu
    Wang, Zhengxin
    Du, Xiangyu
    Zheng, Wei Xing
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [28] Learning with Deep Photonic Neural Networks
    Leelar, Bhawani Shankar
    Shivaleela, E. S.
    Srinivas, T.
    2017 IEEE WORKSHOP ON RECENT ADVANCES IN PHOTONICS (WRAP), 2017,
  • [29] Deep Learning with Random Neural Networks
    Gelenbe, Erol
    Yin, Yongha
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1633 - 1638
  • [30] Deep Learning with Random Neural Networks
    Gelenbe, Erol
    Yin, Yongha
    PROCEEDINGS OF SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS) 2016, VOL 2, 2018, 16 : 450 - 462