AdaSAM: Boosting sharpness-aware minimization with adaptive learning rate and momentum for neural networks

被引:7
|
作者
Sun, Hao [1 ]
Shen, Li [2 ]
Zhong, Qihuang [3 ]
Ding, Liang [2 ]
Chen, Shixiang [4 ]
Sun, Jingwei [1 ]
Li, Jing [1 ]
Sun, Guangzhong [1 ]
Tao, Dacheng [5 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci, Hefei 230026, Anhui, Peoples R China
[2] JD com, Beijing, Peoples R China
[3] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Hubei, Peoples R China
[4] Univ Sci & Technol China, Sch Math Sci, Hefei 230026, Anhui, Peoples R China
[5] Univ Sydney, Sch Comp Sci, Sydney, NSW 2006, Australia
关键词
Sharpness-aware minimization; Adaptive learning rate; Non-convex optimization; Momentum acceleration; Linear speedup; CONVERGENCE;
D O I
10.1016/j.neunet.2023.10.044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sharpness aware minimization (SAM) optimizer has been extensively explored as it can generalize better for training deep neural networks via introducing extra perturbation steps to flatten the landscape of deep learning models. Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks without theoretical guarantee due to the triple difficulties in analyzing the coupled perturbation step, adaptive learning rate and momentum step. In this paper, we try to analyze the convergence rate of AdaSAM in the stochastic non-convex setting. We root theoretically show that AdaSAM admits a O(1/ bT) convergence rate, which achieves linear speedup property with respect to mini-batch size b. Specifically, to decouple the stochastic gradient steps with the adaptive learning rate and perturbed gradient, we introduce the delayed second-order momentum term to decompose them to make them independent while taking an expectation during the analysis. Then we bound them by showing the adaptive learning rate has a limited range, which makes our analysis feasible. To the best of our knowledge, we are the first to provide the non-trivial convergence rate of SAM with an adaptive learning rate and momentum acceleration. At last, we conduct several experiments on several NLP tasks and the synthetic task, which show that AdaSAM could achieve superior performance compared with SGD, AMSGrad, and SAM optimizers.
引用
收藏
页码:506 / 519
页数:14
相关论文
共 50 条
  • [31] Training memristor-based multilayer neuromorphic networks with SGD, momentum and adaptive learning rates
    Yan, Zheng
    Chen, Jiadong
    Hu, Rui
    Huang, Tingwen
    Chen, Yiran
    Wen, Shiping
    NEURAL NETWORKS, 2020, 128 (142-149) : 142 - 149
  • [32] ASMAFL: Adaptive Staleness-Aware Momentum Asynchronous Federated Learning in Edge Computing
    Qiao, Dewen
    Guo, Songtao
    Zhao, Jun
    Le, Junqing
    Zhou, Pengzhan
    Li, Mingyan
    Chen, Xuetao
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (04) : 3390 - 3406
  • [33] Fast Learning in Spiking Neural Networks by Learning Rate Adaptation
    Fang Huijuan
    Luo Jiliang
    Wang Fei
    CHINESE JOURNAL OF CHEMICAL ENGINEERING, 2012, 20 (06) : 1219 - 1224
  • [34] Backpropagation Neural Network with Adaptive Learning Rate for Classification
    Jullapak, Rujira
    Thammano, Arit
    ADVANCES IN NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, ICNC-FSKD 2022, 2023, 153 : 493 - 499
  • [35] A new adaptive momentum algorithm for split-complex recurrent neural networks
    Xu, Dongpo
    Shao, Hongmei
    Zhang, Huisheng
    NEUROCOMPUTING, 2012, 93 : 133 - 136
  • [36] A review of adaptive online learning for artificial neural networks
    Perez-Sanchez, Beatriz
    Fontenla-Romero, Oscar
    Guijarro-Berdinas, Bertha
    ARTIFICIAL INTELLIGENCE REVIEW, 2018, 49 (02) : 281 - 299
  • [37] On adaptive learning rate that guarantees convergence in feedforward networks
    Behera, Laxmidhar
    Kumar, Swagat
    Patnaik, Awhan
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2006, 17 (05): : 1116 - 1125
  • [38] Neural Networks Predictive Controller Using an Adaptive Control Rate
    Mnasser, Ahmed
    Bouani, Faouzi
    Ksouri, Mekki
    INTERNATIONAL JOURNAL OF SYSTEM DYNAMICS APPLICATIONS, 2014, 3 (03) : 127 - 147
  • [39] Accelerating Learning Performance of Back Propagation Algorithm by Using Adaptive Gain Together with Adaptive Momentum and Adaptive Learning Rate on Classification Problems
    Hamid, Norhamreeza Abdul
    Nawi, Nazri Mohd
    Ghazali, Rozaida
    Salleh, Mohd Najib Mohd
    UBIQUITOUS COMPUTING AND MULTIMEDIA APPLICATIONS, PT II, 2011, 151 : 559 - 570
  • [40] Convergence of Cyclic and Almost-Cyclic Learning with Momentum for Feedforward Neural Networks
    Wang, Jian
    Yang, Jie
    Wu, Wei
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (08): : 1297 - 1306