XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

被引：2

作者：

Guan, Lei ^{[1
]}

Li, Dongsheng ^{[2
]}

Shi, Yanqi ^{[2
]}

Meng, Jian ^{[1
]}

机构：

[1] Natl Univ Def Technol, Dept Math, Changsha 410073, Hunan, Peoples R China

[2] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Comp, Changsha 410073, Hunan, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Training; Artificial neural networks; Convergence; Computational modeling; Backpropagation; Proposals; Predictive models; deep learning; generalization; gradient-based; optimizer; weight prediction;

D O I：

10.1109/TPAMI.2024.3387399

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models.

引用

页码：6731 / 6747

页数：17

共 50 条

[31] A Gradient-Based Clustering for Multi-Database Mining
Miloudi, Salim
Wang, Yulin
Ding, Wenjia
IEEE ACCESS, 2021, 9 : 11144 - 11172
[32] Gradient-Based Illumination Description for Image Forgery Detection
Matern, Falko
Riess, Christian
Stamminger, Marc
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2020, 15 : 1303 - 1317
[33] Application of Extreme Gradient Boosting Based on Grey Relation Analysis for Prediction of Compressive Strength of Concrete
Cui, Liyun
Chen, Peiyuan
Wang, Liang
Li, Jin
Ling, Hao
ADVANCES IN CIVIL ENGINEERING, 2021, 2021
[34] Gradient-Based Instance-Specific Visual Explanations for Object Specification and Object Discrimination
Zhao, Chenyang
Hsiao, Janet H.
Chan, Antoni B.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 5967 - 5985
[35] NORMALIZATION AND CONVERGENCE OF GRADIENT-BASED ALGORITHMS FOR ADAPTIVE IIR FILTERS
RUPP, M
SIGNAL PROCESSING, 1995, 46 (01) : 15 - 30
[36] Gradient-Based Distributed Controller Design Over Directed Networks
Watanabe, Yuto
Sakurama, Kazunori
Ahn, Hyo-Sung
IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2024, 11 (04): : 1998 - 2009
[37] Edge Gradient-Based Active Learning for Hyperspectral Image Classification
Samat, Alim
Li, Jun
Lin, Cong
Liu, Sicong
Li, Erzhu
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (09) : 1588 - 1592
[38] Accelerating Attention through Gradient-Based Learned Runtime Pruning
Li, Zheng
Ghodrati, Soroush
Yazdanbakhsh, Amir
Esmaeilzadeh, Hadi
Kang, Mingu
PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22), 2022, : 902 - 915
[39] MAMGD: Gradient-Based Optimization Method Using Exponential Decay
Sakovich, Nikita
Aksenov, Dmitry
Pleshakova, Ekaterina
Gataullin, Sergey
TECHNOLOGIES, 2024, 12 (09)
[40] Maximum likelihood gradient-based iterative estimation for multivariable systems
Xia, Huafeng
Yang, Yongqing
Ding, Feng
Xu, Ling
Hayat, Tasawar
IET CONTROL THEORY AND APPLICATIONS, 2019, 13 (11) : 1683 - 1691

← 1 2 3 4 5 →