An Effective Optimization Method for Machine Learning Based on ADAM

被引:114
作者
Yi, Dokkyun [1 ]
Ahn, Jaehyun [2 ]
Ji, Sangmin [2 ]
机构
[1] Daegu Univ Coll, Div Creat Integrated Gen Studies, Kyungsan 38453, South Korea
[2] Chungnam Natl Univ, Coll Nat Sci, Dept Math, Daejeon 34134, South Korea
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 03期
基金
新加坡国家研究基金会;
关键词
numerical optimization; ADAM; machine learning; stochastic gradient methods; GRADIENT;
D O I
10.3390/app10031073
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A machine is taught by finding the minimum value of the cost function which is induced by learning data. Unfortunately, as the amount of learning increases, the non-liner activation function in the artificial neural network (ANN), the complexity of the artificial intelligence structures, and the cost function's non-convex complexity all increase. We know that a non-convex function has local minimums, and that the first derivative of the cost function is zero at a local minimum. Therefore, the methods based on a gradient descent optimization do not undergo further change when they fall to a local minimum because they are based on the first derivative of the cost function. This paper introduces a novel optimization method to make machine learning more efficient. In other words, we construct an effective optimization method for non-convex cost function. The proposed method solves the problem of falling into a local minimum by adding the cost function in the parameter update rule of the ADAM method. We prove the convergence of the sequences generated from the proposed method and the superiority of the proposed method by numerical comparison with gradient descent (GD, ADAM, and AdaMax).
引用
收藏
页数:20
相关论文
共 25 条
  • [1] Natural gradient works efficiently in learning
    Amari, S
    [J]. NEURAL COMPUTATION, 1998, 10 (02) : 251 - 276
  • [2] [Anonymous], 1988, TECHNICAL REPORT
  • [3] Optimization Methods for Large-Scale Machine Learning
    Bottou, Leon
    Curtis, Frank E.
    Nocedal, Jorge
    [J]. SIAM REVIEW, 2018, 60 (02) : 223 - 311
  • [4] Dean J., 2012, ADV NEURAL INFORM PR, V1, P1223
  • [5] Deng L, 2013, INT CONF ACOUST SPEE, P8604, DOI 10.1109/ICASSP.2013.6639345
  • [6] Duchi J, 2011, J MACH LEARN RES, V12, P2121
  • [7] Graves A., 2013, ARXIV PREPRINT ARXIV
  • [8] Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
  • [9] Hinton GE., 2012, ARXIV PREPRINT ARXIV
  • [10] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97