XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

被引:2
|
作者
Guan, Lei [1 ]
Li, Dongsheng [2 ]
Shi, Yanqi [2 ]
Meng, Jian [1 ]
机构
[1] Natl Univ Def Technol, Dept Math, Changsha 410073, Hunan, Peoples R China
[2] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Comp, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Artificial neural networks; Convergence; Computational modeling; Backpropagation; Proposals; Predictive models; deep learning; generalization; gradient-based; optimizer; weight prediction;
D O I
10.1109/TPAMI.2024.3387399
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models.
引用
收藏
页码:6731 / 6747
页数:17
相关论文
共 50 条
  • [21] Spatial Steganalysis Based on Gradient-Based Neural Architecture Search
    Deng, Xiaoqing
    Luo, Weiqi
    Fang, Yanmei
    PROVABLE AND PRACTICAL SECURITY, PROVSEC 2021, 2021, 13059 : 365 - 375
  • [22] Gradient-Based Discrete-Time Concurrent Learning for Standalone Function Approximation
    Djaneye-Boundjou, Ouboti
    Ordonez, Raul
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (02) : 749 - 756
  • [23] Robust multiscale algorithms for gradient-based motion estimation
    Lu, Qing-Hua
    Zhang, Xian-Min
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2007, 17 (06) : 333 - 340
  • [24] Parallel Decomposition Approach to Gradient-Based EM Optimization
    Gongal-Reddy, Venu-Madhav-Reddy
    Feng, Feng
    Zhang, Chao
    Zhang, Shunlu
    Zhang, Qi-Jun
    IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, 2016, 64 (11) : 3380 - 3399
  • [25] Robust Beamforming With Gradient-Based Liquid Neural Network
    Wang, Xinquan
    Zhu, Fenghao
    Huang, Chongwen
    Alhammadi, Ahmed
    Bader, Faouzi
    Zhang, Zhaoyang
    Yuen, Chau
    Debbah, Merouane
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2024, 13 (11) : 3020 - 3024
  • [26] A gradient-based transformation method in multidisciplinary design optimization
    Lin, Po Ting
    Gea, Hae Chang
    STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION, 2013, 47 (05) : 715 - 733
  • [27] Gradient-based registration of rotated, scaled, and translated images
    Li, Xiangguo
    MIPPR 2013: PATTERN RECOGNITION AND COMPUTER VISION, 2013, 8919
  • [28] Embedding Gradient-Based Optimization in Image Registration Networks
    Qiu, Huaqi
    Hammernik, Kerstin
    Qin, Chen
    Chen, Chen
    Rueckert, Daniel
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VI, 2022, 13436 : 56 - 65
  • [29] An Efficient Chaotic Gradient-Based Optimizer for Feature Selection
    Abd Elminaam, Diaa Salama
    Ibrahim, Shimaa Abdallah
    Houssein, Essam H.
    Elsayed, Salah M.
    IEEE ACCESS, 2022, 10 : 9271 - 9286
  • [30] A modified conjugate gradient-based Elman neural network
    Li, Long
    Xie, Xuetao
    Gao, Tao
    Wang, Jian
    COGNITIVE SYSTEMS RESEARCH, 2021, 68 : 62 - 72