Tuning parameters of deep neural network training algorithms pays off: a computational study

被引:1
作者
Coppola, Corrado [1 ]
Papa, Lorenzo [1 ]
Boresta, Marco [2 ]
Amerini, Irene [1 ]
Palagi, Laura [1 ]
机构
[1] Sapienza Univ Rome, Dept Comp Control & Management Engn Antonio Rubert, Via Ariosto 25, Rome, Italy
[2] CNR, Ist Anal Sist Informat Antonio Ruberti, Via Taurini 19, I-00185 Rome, Italy
关键词
Large-scale optimization; Machine learning; Deep network; Convolutional neural network; BFGS METHOD; OPTIMIZATION;
D O I
10.1007/s11750-024-00683-x
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
The paper aims to investigate the impact of the optimization algorithms on the training of deep neural networks with an eye to the interaction between the optimizer and the generalization performance. In particular, we aim to analyze the behavior of state-of-the-art optimization algorithms in relationship to their hyperparameters setting to detect robustness with respect to the choice of a certain starting point in ending on different local solutions. We conduct extensive computational experiments using nine open-source optimization algorithms to train deep Convolutional Neural Network architectures on an image multi-class classification task. Precisely, we consider several architectures by changing the number of layers and neurons per layer, to evaluate the impact of different width and depth structures on the computational optimization performance. We show that the optimizers often return different local solutions and highlight the strong correlation between the quality of the solution found and the generalization capability of the trained network. We also discuss the role of hyperparameters tuning and show how a tuned hyperparameters setting can be re-used for the same task on different problems achieving better efficiency and generalization performance than a default setting.
引用
收藏
页码:579 / 620
页数:42
相关论文
共 92 条
[1]  
Abbaschian BJ, 2021, SENSORS, P21
[2]   High-dimensional dynamics of generalization error in neural networks [J].
Advani, Madhu S. ;
Saxe, Andrew M. ;
Sompolinsky, Haim .
NEURAL NETWORKS, 2020, 132 :428-446
[3]  
[Anonymous], 2017, Deep learning
[4]  
[Anonymous], 2012, Adadelta: An adaptive learning rate method
[5]  
[Anonymous], 2011, JMLR WORKSHOP C P
[6]   A comparative study of the leading machine learning techniques and two new optimization algorithms [J].
Baumann, P. ;
Hochbaum, D. S. ;
Yang, Y. T. .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2019, 272 (03) :1041-1057
[7]   On testing machine learning programs [J].
Ben Braiek, Houssem ;
Khomh, Foutse .
JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 164
[8]  
Bengio S., 2016, ARXIV161103530
[9]  
Berahas AS, 2016, ADV NEUR IN, V29
[10]   A robust multi-batch L-BFGS method for machine learning* [J].
Berahas, Albert S. ;
Takac, Martin .
OPTIMIZATION METHODS & SOFTWARE, 2020, 35 (01) :191-219