XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

被引:2
|
作者
Guan, Lei [1 ]
Li, Dongsheng [2 ]
Shi, Yanqi [2 ]
Meng, Jian [1 ]
机构
[1] Natl Univ Def Technol, Dept Math, Changsha 410073, Hunan, Peoples R China
[2] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Comp, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Artificial neural networks; Convergence; Computational modeling; Backpropagation; Proposals; Predictive models; deep learning; generalization; gradient-based; optimizer; weight prediction;
D O I
10.1109/TPAMI.2024.3387399
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models.
引用
收藏
页码:6731 / 6747
页数:17
相关论文
共 50 条
  • [41] The Barker proposal: Combining robustness and efficiency in gradient-based MCMC
    Livingstone, Samuel
    Zanella, Giacomo
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (02) : 496 - 523
  • [42] A Self-Care Prediction Model for Children with Disability Based on Genetic Algorithm and Extreme Gradient Boosting
    Syafrudin, Muhammad
    Alfian, Ganjar
    Fitriyani, Norma Latif
    Anshari, Muhammad
    Hadibarata, Tony
    Fatwanto, Agung
    Rhee, Jongtae
    MATHEMATICS, 2020, 8 (09)
  • [43] Correcting gradient-based interpretations of deep neural networks for genomics
    Majdandzic, Antonio
    Rajesh, Chandana
    Koo, Peter K.
    GENOME BIOLOGY, 2023, 24 (01)
  • [44] A Stochastic Gradient-Based Projection Algorithm for Distributed Constrained Optimization
    Zhang, Keke
    Gao, Shanfu
    Chen, Yingjue
    Zheng, Zuqing
    Lu, Qingguo
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT I, 2024, 14447 : 356 - 367
  • [45] Sparse Channel Estimation with Gradient-Based Algorithms: A comparative Study
    Abd El-Moaty, Ahmed M.
    Zerguine, Azzedine
    2018 15TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS AND DEVICES (SSD), 2018, : 60 - 64
  • [46] Gradient-Based Aero-Stealth Optimization of a Simplified Aircraft
    Thoulon, Charles
    Roge, Gilbert
    Pironneau, Olivier
    FLUIDS, 2024, 9 (08)
  • [47] Learning Gradient-Based ICA by Neurally Estimating Mutual Information
    Hlynsson, Hlynur David
    Wiskott, Laurenz
    ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2019, 2019, 11793 : 182 - 187
  • [48] Correcting gradient-based interpretations of deep neural networks for genomics
    Antonio Majdandzic
    Chandana Rajesh
    Peter K. Koo
    Genome Biology, 24
  • [49] On CNN Applied to Speech-to-Text-Comparative Analysis of Different Gradient Based Optimizers
    Gaiceanu, Theodora
    Pastravanu, Octavian
    IEEE 15TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS (SACI 2021), 2021, : 85 - 90
  • [50] Application of Gradient Boosting in the Design of Fuzzy Rule-Based Regression Models
    Zhang, Huimin
    Hu, Xingchen
    Zhu, Xiubin
    Liu, Xinwang
    Pedrycz, Witold
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (11) : 5621 - 5632