SALR: Sharpness-Aware Learning Rate Scheduler for Improved Generalization

被引:0
|
作者
Yue, Xubo [1 ]
Nouiehed, Maher [2 ]
Al Kontar, Raed [1 ]
机构
[1] Univ Michigan, Dept Ind & Operat Engn, Ann Arbor, MI 48109 USA
[2] Amer Univ Beirut, Dept Ind Engn & Management, Beirut 1072020, Lebanon
基金
美国国家科学基金会;
关键词
Schedules; Deep learning; Neural networks; Convergence; Bayes methods; Training; Stochastic processes; generalization; learning rate schedule; sharpness;
D O I
10.1109/TNNLS.2023.3263393
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In an effort to improve generalization in deep learning and automate the process of learning rate scheduling, we propose SALR: a sharpness-aware learning rate update technique designed to recover flat minimizers. Our method dynamically updates the learning rate of gradient-based optimizers based on the local sharpness of the loss function. This allows optimizers to automatically increase learning rates at sharp valleys to increase the chance of escaping them. We demonstrate the effectiveness of SALR when adopted by various algorithms over a broad range of networks. Our experiments indicate that SALR improves generalization, converges faster, and drives solutions to significantly flatter regions.
引用
收藏
页码:12518 / 12527
页数:10
相关论文
共 50 条
  • [41] Improving Generalization Performance of Adaptive Learning Rate by Switching from Block Diagonal Matrix Preconditioning to SGD
    Ida, Yasutoshi
    Fujiwara, Yasuhiro
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [42] Texture aware autoencoder pre-training and pairwise learning refinement for improved iris recognition
    Chakraborty, Manashi
    Chakraborty, Aritri
    Biswas, Prabir Kumar
    Mitra, Pabitra
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (16) : 25381 - 25401
  • [43] Deep Learning-Based Total Kidney Volume Segmentation in Autosomal Dominant Polycystic Kidney Disease Using Attention, Cosine Loss, and Sharpness Aware Minimization
    Raj, Anish
    Tollens, Fabian
    Hansen, Laura
    Golla, Alena-Kathrin
    Schad, Lothar R.
    Noerenberg, Dominik
    Zoellner, Frank G.
    DIAGNOSTICS, 2022, 12 (05)
  • [44] Adaptive learning rate algorithms based on the improved Barzilai-Borwein method
    Wang, Zhi-Jun
    Li, Hong
    Xu, Zhou-Xiang
    Zhao, Shuai-Ye
    Wang, Peng-Jun
    Gao, He-Bei
    PATTERN RECOGNITION, 2025, 160
  • [45] A New Approach to Distributed Hypothesis Testing and Non-Bayesian Learning: Improved Learning Rate and Byzantine Resilience
    Mitra, Aritra
    Richards, John A.
    Sundaram, Shreyas
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (09) : 4084 - 4100
  • [46] An improved ensemble learning method for exchange rate forecasting based on complementary effect of shallow and deep features
    Wang, Gang
    Tao, Tao
    Ma, Jingling
    Li, Hui
    Fu, Huimin
    Chu, Yan
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184
  • [47] Achieving small-batch accuracy with large-batch scalability via Hessian-aware learning rate adjustment
    Lee, Sunwoo
    He, Chaoyang
    Avestimehr, Salman
    NEURAL NETWORKS, 2023, 158 : 1 - 14
  • [48] Improved A-Line and B-Line Detection in Lung Ultrasound Using Deep Learning with Boundary-Aware Dice Loss
    Abbasi, Soolmaz
    Wahd, Assefa Seyoum
    Ghosh, Shrimanti
    Ezzelarab, Maha
    Panicker, Mahesh
    Chen, Yale Tung
    Jaremko, Jacob L.
    Hareendranathan, Abhilash
    BIOENGINEERING-BASEL, 2025, 12 (03):
  • [49] Automatic stomata recognition and measurement based on improved YOLO deep learning model and entropy rate superpixel algorithm
    Zhang, Fan
    Ren, Fangtao
    Li, Jieping
    Zhang, Xinhong
    ECOLOGICAL INFORMATICS, 2022, 68
  • [50] A trajectory and force dual-incremental robot skill learning and generalization framework using improved dynamical movement primitives and adaptive neural network control
    Lu, Zhenyu
    Wang, Ning
    Li, Qinchuan
    Yang, Chenguang
    NEUROCOMPUTING, 2023, 521 (146-159) : 146 - 159