AdamRAG: Adaptive Algorithm with Ravine Method for Training Deep Neural Networks

被引：0

作者：

Yifan Zhang ^{[1
]}

Di Zhao ^{[1
]}

Hongyi Li ^{[2
]}

Chengwei Pan ^{[1
]}

机构：

[1] Beihang University,School of Cyber Science and Technology

[2] Beihang University,School of Mathematical Sciences

[3] Beihang University,Institute of Artificial Intelligence

[4] Ministry of Education,Key Laboratory of Mathematics, Informatics and Behavioral Semantics

来源：

Neural Processing Letters | / 57卷 / 3期

关键词：

Non-convex optimization; Deep learning; Adaptive algorithm; Neural networks; Ravine method;

D O I：

10.1007/s11063-025-11766-6

中图分类号：

学科分类号：

摘要：

Adaptive optimization algorithms, such as Adam, are widely employed in deep learning. However, because they primarily rely on learning rate adjustments, a trade-off often exists between optimization stability and generalization capability. To address this issue, we propose AdamRAG, a novel optimization algorithm that integrates adaptive methods with Ravine acceleration and momentum techniques, aiming to preserve the stability of adaptive algorithms while enhancing their generalization performance. Within the adaptive framework, AdamRAG introduces extrapolation steps based on Ravine acceleration, which not only accelerate convergence but also prevent the iterative process from becoming trapped in local saddle points, thereby boosting generalization. Simultaneously, the momentum method is employed to regulate the descent step sizes, further improving the algorithm’s stability. Theoretical analysis demonstrates that AdamRAG achieves sublinear convergence in non-convex optimization scenarios. Extensive experiments across tasks such as image classification, natural language processing, and reinforcement learning validate its effectiveness, with results indicating that AdamRAG outperforms established optimizers (e.g., NAG, Adam, Lion) in terms of both convergence speed and generalization performance. Furthermore, sensitivity analysis shows that AdamRAG exhibits greater robustness to variations in learning rate, significantly reducing the need for hyperparameter tuning. These findings suggest that by integrating Ravine acceleration, adaptive methods, and momentum techniques, AdamRAG effectively mitigates the trade-off between stability and generalization, providing an efficient and robust optimization tool for deep learning applications.

引用

共 50 条

[1] An Adaptive Layer Expansion Algorithm for Efficient Training of Deep Neural Networks
Chen, Yi-Long
Liu, Pangfeng
Wu, Jan-Jan
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 420 - 425
[2] PWPROP: A Progressive Weighted Adaptive Method for Training Deep Neural Networks
Wang, Dong
Xu, Tao
Zhang, Huatian
Shang, Fanhua
Liu, Hongying
Liu, Yuanyuan
Shen, Shengmei
2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 508 - 515
[3] AdaXod: a new adaptive and momental bound algorithm for training deep neural networks
Yuanxuan Liu
Dequan Li
The Journal of Supercomputing, 2023, 79 : 17691 - 17715
[4] AdaXod: a new adaptive and momental bound algorithm for training deep neural networks
Liu, Yuanxuan
Li, Dequan
JOURNAL OF SUPERCOMPUTING, 2023, 79 (15) : 17691 - 17715
[5] Adaptive Learning Rate and Momentum for Training Deep Neural Networks
Hao, Zhiyong
Jiang, Yixuan
Yu, Huihua
Chiang, Hsiao-Dong
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT III, 2021, 12977 : 381 - 396
[6] MULTILINGUAL TRAINING OF DEEP NEURAL NETWORKS
Ghoshal, Arnab
Swietojanski, Pawel
Renals, Steve
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7319 - 7323
[7] Adaptive Weight Decay for Deep Neural Networks
Nakamura, Kensuke
Hong, Byung-Woo
IEEE ACCESS, 2019, 7 : 118857 - 118865
[8] On the overfly algorithm in deep learning of neural networks
Tsygvintsev, Alexei
APPLIED MATHEMATICS AND COMPUTATION, 2019, 349 : 348 - 358
[9] Generalize Deep Neural Networks With Adaptive Regularization for Classifying
Guo, Kehua
Tao, Ze
Zhang, Lingyan
Hu, Bin
Kui, Xiaoyan
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (01) : 1216 - 1229
[10] Sequence-discriminative training of deep neural networks
Vesely, Karel
Ghoshal, Arnab
Burgett, Lukas
Povey, Daniel
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2344 - 2348

← 1 2 3 4 5 →