XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

被引：2

作者：

Guan, Lei ^{[1
]}

Li, Dongsheng ^{[2
]}

Shi, Yanqi ^{[2
]}

Meng, Jian ^{[1
]}

机构：

[1] Natl Univ Def Technol, Dept Math, Changsha 410073, Hunan, Peoples R China

[2] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Comp, Changsha 410073, Hunan, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Training; Artificial neural networks; Convergence; Computational modeling; Backpropagation; Proposals; Predictive models; deep learning; generalization; gradient-based; optimizer; weight prediction;

D O I：

10.1109/TPAMI.2024.3387399

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models.

引用

页码：6731 / 6747

页数：17

共 50 条

[21] Spatial Steganalysis Based on Gradient-Based Neural Architecture Search
Deng, Xiaoqing
Luo, Weiqi
Fang, Yanmei
PROVABLE AND PRACTICAL SECURITY, PROVSEC 2021, 2021, 13059 : 365 - 375
[22] Gradient-Based Discrete-Time Concurrent Learning for Standalone Function Approximation
Djaneye-Boundjou, Ouboti
Ordonez, Raul
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (02) : 749 - 756
[23] Robust multiscale algorithms for gradient-based motion estimation
Lu, Qing-Hua
Zhang, Xian-Min
INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2007, 17 (06) : 333 - 340
[24] Parallel Decomposition Approach to Gradient-Based EM Optimization
Gongal-Reddy, Venu-Madhav-Reddy
Feng, Feng
Zhang, Chao
Zhang, Shunlu
Zhang, Qi-Jun
IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, 2016, 64 (11) : 3380 - 3399
[25] Robust Beamforming With Gradient-Based Liquid Neural Network
Wang, Xinquan
Zhu, Fenghao
Huang, Chongwen
Alhammadi, Ahmed
Bader, Faouzi
Zhang, Zhaoyang
Yuen, Chau
Debbah, Merouane
IEEE WIRELESS COMMUNICATIONS LETTERS, 2024, 13 (11) : 3020 - 3024
[26] A gradient-based transformation method in multidisciplinary design optimization
Lin, Po Ting
Gea, Hae Chang
STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION, 2013, 47 (05) : 715 - 733
[27] Gradient-based registration of rotated, scaled, and translated images
Li, Xiangguo
MIPPR 2013: PATTERN RECOGNITION AND COMPUTER VISION, 2013, 8919
[28] Embedding Gradient-Based Optimization in Image Registration Networks
Qiu, Huaqi
Hammernik, Kerstin
Qin, Chen
Chen, Chen
Rueckert, Daniel
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VI, 2022, 13436 : 56 - 65
[29] An Efficient Chaotic Gradient-Based Optimizer for Feature Selection
Abd Elminaam, Diaa Salama
Ibrahim, Shimaa Abdallah
Houssein, Essam H.
Elsayed, Salah M.
IEEE ACCESS, 2022, 10 : 9271 - 9286
[30] A modified conjugate gradient-based Elman neural network
Li, Long
Xie, Xuetao
Gao, Tao
Wang, Jian
COGNITIVE SYSTEMS RESEARCH, 2021, 68 : 62 - 72

← 1 2 3 4 5 →