XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

被引:2
|
作者
Guan, Lei [1 ]
Li, Dongsheng [2 ]
Shi, Yanqi [2 ]
Meng, Jian [1 ]
机构
[1] Natl Univ Def Technol, Dept Math, Changsha 410073, Hunan, Peoples R China
[2] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Comp, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Artificial neural networks; Convergence; Computational modeling; Backpropagation; Proposals; Predictive models; deep learning; generalization; gradient-based; optimizer; weight prediction;
D O I
10.1109/TPAMI.2024.3387399
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models.
引用
收藏
页码:6731 / 6747
页数:17
相关论文
共 50 条
  • [1] AdaSwarm: Augmenting Gradient-Based Optimizers in Deep Learning With Swarm Intelligence
    Mohapatra, Rohan
    Saha, Snehanshu
    Coello, Carlos A. Coello
    Bhattacharya, Anwesh
    Dhavala, Soma S.
    Saha, Sriparna
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2022, 6 (02): : 329 - 340
  • [2] Reweighted-Boosting: A Gradient-Based Boosting Optimization Framework
    He, Guanxiong
    Wang, Zheng
    Tang, Liaoyuan
    Yu, Weizhong
    Nie, Feiping
    Li, Xuelong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [3] Recursive Prediction Error Gradient-Based Algorithms and Framework to Identify PMSM Parameters Online
    Perera, Aravinda
    Nilsen, Roy
    IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, 2023, 59 (02) : 1788 - 1799
  • [4] Explainability of Speech Recognition Transformers via Gradient-Based Attention Visualization
    Sun, Tianli
    Chen, Haonan
    Hu, Guosheng
    He, Lianghua
    Zhao, Cairong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1395 - 1406
  • [5] Gradient-Based Neuromorphic Learning on Dynamical RRAM Arrays
    Zhou, Peng
    Choi, Dong-Uk
    Lu, Wei D.
    Kang, Sung-Mo
    Eshraghian, Jason K.
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2022, 12 (04) : 888 - 897
  • [6] Distributed Photovoltaic Distribution Voltage Prediction Based on eXtreme Gradient Boosting and Time Convolutional Networks
    Yuan, Fang
    Lu, Yong
    Xie, Zhi
    Dai, Shenxiang
    IEEE ACCESS, 2024, 12 : 177576 - 177588
  • [7] Maximum Latency Prediction Based on Random Forests and Gradient Boosting Machine for AVB Traffic in TSN
    Zhang, Xiaodi
    Li, Dong
    Piao, Jinnan
    IEEE COMMUNICATIONS LETTERS, 2025, 29 (02) : 264 - 268
  • [8] Topological Gradient-based Competitive Learning
    Barbiero, Pietro
    Ciravegna, Gabriele
    Randazzo, Vincenzo
    Pasero, Eros
    Cirrincione, Giansalvo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [9] Explainable Time-Series Prediction Using a Residual Network and Gradient-Based Methods
    Choi, Hyojung
    Jung, Chanhwi
    Kang, Taein
    Kim, Hyunwoo J.
    Kwak, Il-Youp
    IEEE ACCESS, 2022, 10 : 108469 - 108482
  • [10] Explainable Steel Quality Prediction System Based on Gradient Boosting Decision Trees
    Takalo-Mattila, Janne
    Heiskanen, Mikko
    Kyllonen, Vesa
    Maatta, Leena
    Bogdanoff, Agne
    IEEE ACCESS, 2022, 10 : 68099 - 68110