AdaLip: An Adaptive Learning Rate Method per Layer for Stochastic Optimization

被引:10
作者
Ioannou, George [1 ]
Tagaris, Thanos [1 ]
Stafylopatis, Andreas [1 ]
机构
[1] Natl Tech Univ Athens, Artificial Intelligence & Learning Syst Lab, Zografos 15780, Greece
关键词
Neural networks; Online learning; Stochastic optimization; Adaptive learning rate; Lipschitz constant;
D O I
10.1007/s11063-022-11140-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Various works have been published around the optimization of Neural Networks that emphasize the significance of the learning rate. In this study we analyze the need for a different treatment for each layer and how this affects training. We propose a novel optimization technique, called AdaLip, that utilizes an estimation of the Lipschitz constant of the gradients in order to construct an adaptive learning rate per layer that can work on top of already existing optimizers, like SGD or Adam. A detailed experimental framework was used to prove the usefulness of the optimizer on three benchmark datasets. It showed that AdaLip improves the training performance and the convergence speed, but also made the training process more robust to the selection of the initial global learning rate.
引用
收藏
页码:6311 / 6338
页数:28
相关论文
共 38 条
[1]  
Baydin Atilim Gunes., 2018, INT C LEARNING REPRE
[2]   Optimization Methods for Large-Scale Machine Learning [J].
Bottou, Leon ;
Curtis, Frank E. ;
Nocedal, Jorge .
SIAM REVIEW, 2018, 60 (02) :223-311
[3]  
Chollet F., 2017, DEEP LEARNING PYTHON
[4]  
Choromanska Anna., 2015, C LEARNING THEORY, P1756
[5]  
Duchi J, 2011, J MACH LEARN RES, V12, P2121
[6]  
Fazlyab M, 2019, ADV NEUR IN, V32
[7]  
Ge Rong., 2015, P 28 C LEARNING THEO, P797
[8]  
Glorot X., Understanding the difficulty of training deep feedforward neural networks
[9]   A survey of deep learning techniques for autonomous driving [J].
Grigorescu, Sorin ;
Trasnea, Bogdan ;
Cocias, Tiberiu ;
Macesanu, Gigel .
JOURNAL OF FIELD ROBOTICS, 2020, 37 (03) :362-386
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778