XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

被引:2
|
作者
Guan, Lei [1 ]
Li, Dongsheng [2 ]
Shi, Yanqi [2 ]
Meng, Jian [1 ]
机构
[1] Natl Univ Def Technol, Dept Math, Changsha 410073, Hunan, Peoples R China
[2] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Comp, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Artificial neural networks; Convergence; Computational modeling; Backpropagation; Proposals; Predictive models; deep learning; generalization; gradient-based; optimizer; weight prediction;
D O I
10.1109/TPAMI.2024.3387399
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models.
引用
收藏
页码:6731 / 6747
页数:17
相关论文
共 50 条
  • [31] A Gradient-Based Clustering for Multi-Database Mining
    Miloudi, Salim
    Wang, Yulin
    Ding, Wenjia
    IEEE ACCESS, 2021, 9 : 11144 - 11172
  • [32] Gradient-Based Illumination Description for Image Forgery Detection
    Matern, Falko
    Riess, Christian
    Stamminger, Marc
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2020, 15 : 1303 - 1317
  • [33] Application of Extreme Gradient Boosting Based on Grey Relation Analysis for Prediction of Compressive Strength of Concrete
    Cui, Liyun
    Chen, Peiyuan
    Wang, Liang
    Li, Jin
    Ling, Hao
    ADVANCES IN CIVIL ENGINEERING, 2021, 2021
  • [34] Gradient-Based Instance-Specific Visual Explanations for Object Specification and Object Discrimination
    Zhao, Chenyang
    Hsiao, Janet H.
    Chan, Antoni B.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 5967 - 5985
  • [35] NORMALIZATION AND CONVERGENCE OF GRADIENT-BASED ALGORITHMS FOR ADAPTIVE IIR FILTERS
    RUPP, M
    SIGNAL PROCESSING, 1995, 46 (01) : 15 - 30
  • [36] Gradient-Based Distributed Controller Design Over Directed Networks
    Watanabe, Yuto
    Sakurama, Kazunori
    Ahn, Hyo-Sung
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2024, 11 (04): : 1998 - 2009
  • [37] Edge Gradient-Based Active Learning for Hyperspectral Image Classification
    Samat, Alim
    Li, Jun
    Lin, Cong
    Liu, Sicong
    Li, Erzhu
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (09) : 1588 - 1592
  • [38] Accelerating Attention through Gradient-Based Learned Runtime Pruning
    Li, Zheng
    Ghodrati, Soroush
    Yazdanbakhsh, Amir
    Esmaeilzadeh, Hadi
    Kang, Mingu
    PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22), 2022, : 902 - 915
  • [39] MAMGD: Gradient-Based Optimization Method Using Exponential Decay
    Sakovich, Nikita
    Aksenov, Dmitry
    Pleshakova, Ekaterina
    Gataullin, Sergey
    TECHNOLOGIES, 2024, 12 (09)
  • [40] Maximum likelihood gradient-based iterative estimation for multivariable systems
    Xia, Huafeng
    Yang, Yongqing
    Ding, Feng
    Xu, Ling
    Hayat, Tasawar
    IET CONTROL THEORY AND APPLICATIONS, 2019, 13 (11) : 1683 - 1691