Anomalous diffusion dynamics of learning in deep neural networks

被引:12
作者
Chen, Guozhang [1 ]
Qu, Cheng Kevin [1 ]
Gong, Pulin [1 ]
机构
[1] Univ Sydney, Sch Phys, Sydney, NSW 2006, Australia
基金
澳大利亚研究理事会;
关键词
Deep neural networks; Stochastic gradient descent; Complex systems; ENERGY LANDSCAPE;
D O I
10.1016/j.neunet.2022.01.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning in deep neural networks (DNNs) is implemented through minimizing a highly non-convex loss function, typically by a stochastic gradient descent (SGD) method. This learning process can effectively find generalizable solutions at flat minima. In this study, we present a novel account of how such effective deep learning emerges through the interactions of the SGD and the geometrical structure of the loss landscape. We find that the SGD exhibits rich, complex dynamics when navigating through the loss landscape; initially, the SGD exhibits superdiffusion, which attenuates gradually and changes to subdiffusion at long times when approaching a solution. Such learning dynamics happen ubiquitously in different DNN types such as ResNet, VGG-like networks and Vision Transformers; similar results emerge for various batch size and learning rate settings. The superdiffusion process during the initial learning phase indicates that the motion of SGD along the loss landscape possesses intermittent, big jumps; this non-equilibrium property enables the SGD to effectively explore the loss landscape. By adapting methods developed for studying energy landscapes in complex physical systems, we find that such superdiffusive learning processes are due to the interactions of the SGD and the fractallike regions of the loss landscape. We further develop a phenomenological model to demonstrate the mechanistic role of the fractal-like loss landscape in enabling the SGD to effectively find flat minima. Our results reveal the effectiveness of SGD in deep learning from a novel perspective and have implications for designing efficient deep neural networks.(C) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页码:18 / 28
页数:11
相关论文
共 50 条
  • [31] Spatial relation learning in complementary scenarios with deep neural networks
    Lee, Jae Hee
    Yao, Yuan
    Ozdemir, Ozan
    Li, Mengdi
    Weber, Cornelius
    Liu, Zhiyuan
    Wermter, Stefan
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [32] Federated Learning for Medical Image Analysis with Deep Neural Networks
    Nazir, Sajid
    Kaleem, Mohammad
    DIAGNOSTICS, 2023, 13 (09)
  • [33] Structure Learning for Deep Neural Networks Based on Multiobjective Optimization
    Liu, Jia
    Gong, Maoguo
    Miao, Qiguang
    Wang, Xiaogang
    Li, Hao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (06) : 2450 - 2463
  • [34] Diffense: Defense Against Backdoor Attacks on Deep Neural Networks With Latent Diffusion
    Hu, Bowen
    Chang, Chip-Hong
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2024, 14 (04) : 729 - 742
  • [35] Learning With Sharing: An Edge-Optimized Incremental Learning Method for Deep Neural Networks
    Hussain, Muhammad Awais
    Huang, Shih-An
    Tsai, Tsung-Han
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (02) : 461 - 473
  • [36] Deep Neural Networks with Extreme Learning Machine for Seismic Data Compression
    Nuha, Hilal H.
    Balghonaim, Adil
    Liu, Bo
    Mohandes, Mohamed
    Deriche, Mohamed
    Fekri, Faramarz
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (03) : 1367 - 1377
  • [37] Robust Machine Learning Systems: Reliability and Security for Deep Neural Networks
    Hanif, Muhammad Abdullah
    Khalid, Faiq
    Putra, Rachmad Vidya Wicaksana
    Rehman, Semeen
    Shafique, Muhammad
    2018 IEEE 24TH INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN (IOLTS 2018), 2018, : 257 - 260
  • [38] AN EMPIRICAL STUDY OF LEARNING RATES IN DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
    Senior, Andrew
    Heigold, Georg
    Ranzato, Marc'Aurelio
    Yang, Ke
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6724 - 6728
  • [39] Evolving Deep Parallel Neural Networks for Multi-Task Learning
    Wu, Jie
    Sun, Yanan
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT II, 2022, 13156 : 517 - 531
  • [40] Deep Neural Networks with Extreme Learning Machine for Seismic Data Compression
    Hilal H. Nuha
    Adil Balghonaim
    Bo Liu
    Mohamed Mohandes
    Mohamed Deriche
    Faramarz Fekri
    Arabian Journal for Science and Engineering, 2020, 45 : 1367 - 1377