Unbiased quasi-hyperbolic nesterov-gradient momentum-based optimizers for accelerating convergence

被引：2

作者：

Cheng, Weiwei ^{[1
]}

Yang, Xiaochun ^{[1
,2
,3
]}

Wang, Bin ^{[1
,2
,3
]}

Wang, Wei ^{[4
,5
]}

机构：

[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110167, Liaoning, Peoples R China

[2] Natl Frontiers Sci Ctr Ind Intelligence & Syst Op, Shenyang, Peoples R China

[3] Northeastern Univ, Key Lab Data Analyt & Optimizat Smart Ind, Minist Educ, Shenyang, Peoples R China

[4] Hong Kong Univ Sci & Technol Guangzhou, Informat Hub, Guangzhou, Guangdong, Peoples R China

[5] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

来源：

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS | 2023年 / 26卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Optimizer; Momentum; Accelerate convergence; Unbiased;

D O I：

10.1007/s11280-022-01086-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the training process of deep learning models, one of the important steps is to choose an appropriate optimizer that directly determines the final performance of the model. Choosing the appropriate direction and step size (i.e. learning rate) of parameter update are decisive factors for optimizers. Previous gradient descent optimizers could be oscillated and fail to converge to a minimum point because they are only sensitive to the current gradient. Momentum-Based Optimizers (MBOs) have been widely adopted recently since they can relieve oscillation to accelerate convergence by using the exponentially decaying average of gradients to fine-tune the direction. However, we find that most of the existing MBOs are biased and inconsistent with the local fastest descent direction resulting in a high number of iterations. To accelerate convergence, we propose an Unbiased strategy to adjust the descent direction of a variety of MBOs. We further propose an Unbiased Quasi-hyperbolic Nesterov-gradient strategy (UQN) by combining our Unbiased strategy with the existing Quasi-hyperbolic and Nesterov-gradient. It makes each update step move in the local fastest descent direction, predicts the future gradient to avoid crossing the minimum point, and reduces gradient variance. We extend our strategies to multiple MBOs and prove the convergence of our strategies. The main experimental results presented in this paper are based on popular neural network models and benchmark datasets. The experimental results demonstrate the effectiveness and universality of our proposed strategies.

引用

页码：1323 / 1344

页数：22

共 9 条

[1] Unbiased quasi-hyperbolic nesterov-gradient momentum-based optimizers for accelerating convergence
Weiwei Cheng
Xiaochun Yang
Bin Wang
Wei Wang
World Wide Web, 2023, 26 : 1323 - 1344
[2] Perturbation Initialization, Adam-Nesterov and Quasi-Hyperbolic Momentum for Adversarial Examples
Zou J.-H.
Duan Y.-X.
Ren C.-L.
Qiu J.-Y.
Zhou X.-Y.
Pan Z.-S.
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (01): : 207 - 216
[3] Convergence of Momentum-Based Stochastic Gradient Descent
Jin, Ruinan
He, Xingkang
2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2020, : 779 - 784
[4] On the Global Optimum Convergence of Momentum-based Policy Gradient
Ding, Yuhao
Zhang, Junzi
Lavaei, Javad
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[5] Federated Gradient Averaging for Multi-Site Training with Momentum-Based Optimizers
Remedios, Samuel W.
Butman, John A.
Landman, Bennett A.
Pham, Dzung L.
DOMAIN ADAPTATION AND REPRESENTATION TRANSFER, AND DISTRIBUTED AND COLLABORATIVE LEARNING, DART 2020, DCL 2020, 2020, 12444 : 170 - 180
[6] An Adaptive Quasi-Hyperbolic Momentum Method Based on AdaGrad plus Strategy
Wei, Hongxu
Zhang, Xu
Fang, Zhi
2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 649 - 654
[7] A robust multi-scale learning network with quasi-hyperbolic momentum-based Adam optimizer for bearing intelligent fault diagnosis under sample imbalance scenarios and strong noise environment
Ye, Maoyou
Yan, Xiaoan
Chen, Ning
Liu, Ying
STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2024, 23 (03): : 1664 - 1686
[8] Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Nonconvex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration
Gao, Xuefeng
Gurbuzbalaban, Mert
Zhu, Lingjiong
OPERATIONS RESEARCH, 2021, : 2931 - 2947
[9] Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Nonconvex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration
Gao, Xuefeng
Gürbüzbalaban, Mert
Zhu, Lingjiong
Operations Research, 2022, 70 (05) : 2931 - 2947

← 1 →