Experimental Comparison of Stochastic Optimizers in Deep Learning

被引:18
作者
Okewu, Emmanuel [1 ]
Adewole, Philip [2 ]
Sennaike, Oladipupo [2 ]
机构
[1] Univ Lagos, Ctr Informat Technol & Syst, Lagos, Nigeria
[2] Univ Lagos, Dept Comp Sci, Lagos, Nigeria
来源
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2019, PT V: 19TH INTERNATIONAL CONFERENCE, SAINT PETERSBURG, RUSSIA, JULY 14, 2019, PROCEEDINGS, PART V | 2019年 / 11623卷
关键词
Deep learning; Deep neural networks; Error function; Neural network parameters; Stochastic optimization; NEURAL-NETWORKS;
D O I
10.1007/978-3-030-24308-1_55
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The stochastic optimization problem in deep learning involves finding optimal values of loss function and neural network parameters using a meta-heuristic search algorithm. The fact that these values cannot be reasonably obtained by using a deterministic optimization technique underscores the need for an iterative method that randomly picks data segments, arbitrarily determines initial values of optimization (network) parameters and steadily computes series of error functions until a tolerable error is attained. The typical stochastic optimization algorithm for training deep neural networks as a non-convex optimization problem is gradient descent. It has existing extensions like Stochastic Gradient Descent, Adagrad, Adadelta, RMSProp and Adam. In terms of accuracy, convergence rate and training time, each of these stochastic optimizers represents an improvement. However, there is room for further improvement. This paper presents outcomes of series of experiments conducted with a view to providing empirical evidences of successes made so far. We used Python deep learning libaries (Tensorflow and Keras API) for our experiments. Each algorithm is executed, results collated, and a case made for further research in deep learning to improve training time and convergence rate of deep neural network, as well as accuracy of outcomes. This is in response to the growing demands for deep learning in mission-critical and highly sophisticated decision making processes across industry verticals.
引用
收藏
页码:704 / 715
页数:12
相关论文
共 24 条
  • [1] Anandkumar A, 2016, ICML TUTORIAL
  • [2] [Anonymous], 2017, ICLR 2017
  • [3] [Anonymous], ADADELTA: An Adaptive Learning Rate Method
  • [4] Bengio Y., 2015, NATURE, V521, P436, DOI [10.1038/nature14539.Bibcode:2015Natur.521..436L, DOI 10.1038/NATURE14539.BIBCODE:2015NATUR.521..436L]
  • [5] Brownlee J., 2017, PYTHON MACHINE LEARN
  • [6] Improving efficiency in convolutional neural networks with multilinear filters
    Dat Thanh Tran
    Iosifidis, Alexandros
    Gabbouj, Moncef
    [J]. NEURAL NETWORKS, 2018, 105 : 328 - 339
  • [7] Duchi J, 2011, J MACH LEARN RES, V12, P2121
  • [8] Glorot X., 2010, P 13 INT C ART INT S, P249
  • [9] Graves A., 2013, ARXIV PREPRINT
  • [10] Optimized first-order methods for smooth convex minimization
    Kim, Donghwan
    Fessler, Jeffrey A.
    [J]. MATHEMATICAL PROGRAMMING, 2016, 159 (1-2) : 81 - 107