AdamRAG: Adaptive Algorithm with Ravine Method for Training Deep Neural Networks

被引:0
|
作者
Yifan Zhang [1 ]
Di Zhao [1 ]
Hongyi Li [2 ]
Chengwei Pan [1 ]
机构
[1] Beihang University,School of Cyber Science and Technology
[2] Beihang University,School of Mathematical Sciences
[3] Beihang University,Institute of Artificial Intelligence
[4] Ministry of Education,Key Laboratory of Mathematics, Informatics and Behavioral Semantics
关键词
Non-convex optimization; Deep learning; Adaptive algorithm; Neural networks; Ravine method;
D O I
10.1007/s11063-025-11766-6
中图分类号
学科分类号
摘要
Adaptive optimization algorithms, such as Adam, are widely employed in deep learning. However, because they primarily rely on learning rate adjustments, a trade-off often exists between optimization stability and generalization capability. To address this issue, we propose AdamRAG, a novel optimization algorithm that integrates adaptive methods with Ravine acceleration and momentum techniques, aiming to preserve the stability of adaptive algorithms while enhancing their generalization performance. Within the adaptive framework, AdamRAG introduces extrapolation steps based on Ravine acceleration, which not only accelerate convergence but also prevent the iterative process from becoming trapped in local saddle points, thereby boosting generalization. Simultaneously, the momentum method is employed to regulate the descent step sizes, further improving the algorithm’s stability. Theoretical analysis demonstrates that AdamRAG achieves sublinear convergence in non-convex optimization scenarios. Extensive experiments across tasks such as image classification, natural language processing, and reinforcement learning validate its effectiveness, with results indicating that AdamRAG outperforms established optimizers (e.g., NAG, Adam, Lion) in terms of both convergence speed and generalization performance. Furthermore, sensitivity analysis shows that AdamRAG exhibits greater robustness to variations in learning rate, significantly reducing the need for hyperparameter tuning. These findings suggest that by integrating Ravine acceleration, adaptive methods, and momentum techniques, AdamRAG effectively mitigates the trade-off between stability and generalization, providing an efficient and robust optimization tool for deep learning applications.
引用
收藏
相关论文
共 50 条
  • [1] An Adaptive Layer Expansion Algorithm for Efficient Training of Deep Neural Networks
    Chen, Yi-Long
    Liu, Pangfeng
    Wu, Jan-Jan
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 420 - 425
  • [2] PWPROP: A Progressive Weighted Adaptive Method for Training Deep Neural Networks
    Wang, Dong
    Xu, Tao
    Zhang, Huatian
    Shang, Fanhua
    Liu, Hongying
    Liu, Yuanyuan
    Shen, Shengmei
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 508 - 515
  • [3] AdaXod: a new adaptive and momental bound algorithm for training deep neural networks
    Yuanxuan Liu
    Dequan Li
    The Journal of Supercomputing, 2023, 79 : 17691 - 17715
  • [4] AdaXod: a new adaptive and momental bound algorithm for training deep neural networks
    Liu, Yuanxuan
    Li, Dequan
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (15) : 17691 - 17715
  • [5] Adaptive Learning Rate and Momentum for Training Deep Neural Networks
    Hao, Zhiyong
    Jiang, Yixuan
    Yu, Huihua
    Chiang, Hsiao-Dong
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT III, 2021, 12977 : 381 - 396
  • [6] MULTILINGUAL TRAINING OF DEEP NEURAL NETWORKS
    Ghoshal, Arnab
    Swietojanski, Pawel
    Renals, Steve
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7319 - 7323
  • [7] Adaptive Weight Decay for Deep Neural Networks
    Nakamura, Kensuke
    Hong, Byung-Woo
    IEEE ACCESS, 2019, 7 : 118857 - 118865
  • [8] On the overfly algorithm in deep learning of neural networks
    Tsygvintsev, Alexei
    APPLIED MATHEMATICS AND COMPUTATION, 2019, 349 : 348 - 358
  • [9] Generalize Deep Neural Networks With Adaptive Regularization for Classifying
    Guo, Kehua
    Tao, Ze
    Zhang, Lingyan
    Hu, Bin
    Kui, Xiaoyan
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (01) : 1216 - 1229
  • [10] Sequence-discriminative training of deep neural networks
    Vesely, Karel
    Ghoshal, Arnab
    Burgett, Lukas
    Povey, Daniel
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2344 - 2348