Accelerating gradient descent and Adam via fractional gradients

被引:12
作者
Shin, Yeonjong [1 ]
Darbon, Jerome [2 ]
Karniadakis, George Em [2 ,3 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Math Sci, Daejeon 34141, South Korea
[2] Brown Univ, Div Appl Math, Providence, RI 02912 USA
[3] Brown Univ, Sch Engn, Providence, RI 02912 USA
关键词
Caputo fractional derivative; Non-local calculus; Optimization; Adam; Neural networks; NEURAL-NETWORKS; ORDER;
D O I
10.1016/j.neunet.2023.01.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a class of novel fractional-order optimization algorithms. We define a fractional-order gradient via the Caputo fractional derivatives that generalizes integer-order gradient. We refer it to as the Caputo fractional-based gradient, and develop an efficient implementation to compute it. A general class of fractional-order optimization methods is then obtained by replacing integer-order gradients with the Caputo fractional-based gradients. To give concrete algorithms, we consider gradient descent (GD) and Adam, and extend them to the Caputo fractional GD (CfGD) and the Caputo fractional Adam (CfAdam). We demonstrate the superiority of CfGD and CfAdam on several large scale optimization problems that arise from scientific machine learning applications, such as ill-conditioned least squares problem on real-world data and the training of neural networks involving non-convex objective functions. Numerical examples show that both CfGD and CfAdam result in acceleration over GD and Adam, respectively. We also derive error bounds of CfGD for quadratic functions, which further indicate that CfGD could mitigate the dependence on the condition number in the rate of convergence and results in significant acceleration over GD.(c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页码:185 / 201
页数:17
相关论文
共 27 条
  • [1] Bonnans J.F., 2006, NUMERICAL OPTIMIZATI, V2nd
  • [2] LINEAR MODELS OF DISSIPATION WHOSE Q IS ALMOST FREQUENCY INDEPENDENT-2
    CAPUTO, M
    [J]. GEOPHYSICAL JOURNAL OF THE ROYAL ASTRONOMICAL SOCIETY, 1967, 13 (05): : 529 - &
  • [3] Study on fractional order gradient methods
    Chen, Yuquan
    Gao, Qing
    Wei, Yiheng
    Wang, Yong
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2017, 314 : 310 - 321
  • [4] An innovative fractional order LMS based on variable initial value and gradient order
    Cheng, Songsong
    Wei, Yiheng
    Chen, Yuquan
    Li, Yan
    Wang, Yong
    [J]. SIGNAL PROCESSING, 2017, 133 : 260 - 269
  • [5] TOWARDS A UNIFIED THEORY OF FRACTIONAL AND NONLOCAL VECTOR CALCULUS
    D'Elia, Marta
    Gulian, Mamikon
    Olson, Hayley
    Karniadakis, George Em
    [J]. FRACTIONAL CALCULUS AND APPLIED ANALYSIS, 2021, 24 (05) : 1301 - 1355
  • [6] Numerical methods for nonlocal and fractional models
    D'Elia, Marta
    Du, Qiang
    Glusa, Christian
    Gunzburger, Max
    Tian, Xiaochuan
    Zhou, Zhi
    [J]. ACTA NUMERICA, 2020, 29 : 1 - 124
  • [7] Tikhonov regularization and total least squares
    Golub, GH
    Hansen, PC
    O'Leary, DP
    [J]. SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1999, 21 (01) : 185 - 194
  • [8] The gradient descent method from the perspective of fractional calculus
    Hai, Pham Viet
    Rosenfeld, Joel A.
    [J]. MATHEMATICAL METHODS IN THE APPLIED SCIENCES, 2021, 44 (07) : 5520 - 5547
  • [9] A Fractional Gradient Descent-Based RBF Neural Network
    Khan, Shujaat
    Naseem, Imran
    Malik, Muhammad Ammar
    Togneri, Roberto
    Bennamoun, Mohammed
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (12) : 5311 - 5332
  • [10] Kingma DP, 2014, ADV NEUR IN, V27