Experimental Comparison of Stochastic Optimizers in Deep Learning

被引：18

作者：

Okewu, Emmanuel ^{[1
]}

Adewole, Philip ^{[2
]}

Sennaike, Oladipupo ^{[2
]}

机构：

[1] Univ Lagos, Ctr Informat Technol & Syst, Lagos, Nigeria

[2] Univ Lagos, Dept Comp Sci, Lagos, Nigeria

来源：

COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2019, PT V: 19TH INTERNATIONAL CONFERENCE, SAINT PETERSBURG, RUSSIA, JULY 14, 2019, PROCEEDINGS, PART V | 2019年 / 11623卷

关键词：

Deep learning; Deep neural networks; Error function; Neural network parameters; Stochastic optimization; NEURAL-NETWORKS;

D O I：

10.1007/978-3-030-24308-1_55

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The stochastic optimization problem in deep learning involves finding optimal values of loss function and neural network parameters using a meta-heuristic search algorithm. The fact that these values cannot be reasonably obtained by using a deterministic optimization technique underscores the need for an iterative method that randomly picks data segments, arbitrarily determines initial values of optimization (network) parameters and steadily computes series of error functions until a tolerable error is attained. The typical stochastic optimization algorithm for training deep neural networks as a non-convex optimization problem is gradient descent. It has existing extensions like Stochastic Gradient Descent, Adagrad, Adadelta, RMSProp and Adam. In terms of accuracy, convergence rate and training time, each of these stochastic optimizers represents an improvement. However, there is room for further improvement. This paper presents outcomes of series of experiments conducted with a view to providing empirical evidences of successes made so far. We used Python deep learning libaries (Tensorflow and Keras API) for our experiments. Each algorithm is executed, results collated, and a case made for further research in deep learning to improve training time and convergence rate of deep neural network, as well as accuracy of outcomes. This is in response to the growing demands for deep learning in mission-critical and highly sophisticated decision making processes across industry verticals.

引用

页码：704 / 715

页数：12

共 24 条

[1] Anandkumar A, 2016, ICML TUTORIAL
[2] [Anonymous], 2017, ICLR 2017
[3] [Anonymous], ADADELTA: An Adaptive Learning Rate Method
[4] Bengio Y., 2015, NATURE, V521, P436, DOI [10.1038/nature14539.Bibcode:2015Natur.521..436L, DOI 10.1038/NATURE14539.BIBCODE:2015NATUR.521..436L]
[5] Brownlee J., 2017, PYTHON MACHINE LEARN
[6] Improving efficiency in convolutional neural networks with multilinear filters
Dat Thanh Tran
Iosifidis, Alexandros
Gabbouj, Moncef
[J]. NEURAL NETWORKS, 2018, 105 : 328 - 339
[7] Duchi J, 2011, J MACH LEARN RES, V12, P2121
[8] Glorot X., 2010, P 13 INT C ART INT S, P249
[9] Graves A., 2013, ARXIV PREPRINT
[10] Optimized first-order methods for smooth convex minimization
Kim, Donghwan
Fessler, Jeffrey A.
[J]. MATHEMATICAL PROGRAMMING, 2016, 159 (1-2) : 81 - 107

← 1 2 3 →