sqFm: a novel adaptive optimization scheme for deep learning model

被引：3

作者：

Bhakta, Shubhankar ^{[1
]}

Nandi, Utpal ^{[1
]}

Mondal, Madhab ^{[2
]}

Mahapatra, Kuheli Ray ^{[3
]}

Chowdhuri, Partha ^{[1
]}

Pal, Pabitra ^{[4
]}

机构：

[1] Vidyasagar Univ, Dept Comp Sci, Paschim Medinipur 721102, West Bengal, India

[2] Mahishadal Girls Coll, Dept Math, Purba Medinipur 721628, West Bengal, India

[3] Bajkul Milani Mahavidyalaya, Dept Comp Sci, Purba Medinipur 721655, West Bengal, India

[4] Maulana Abul Kalam Azad Univ Technol, Dept Comp Applicat, Haringhata 741249, West Bengal, India

来源：

EVOLUTIONARY INTELLIGENCE | 2024年 / 17卷 / 01期

关键词：

DiffGrad; Adam; AngularGrad; ResNet34; VGG16; ResNet18; DenseNet121; ResNet50; Neural network; Optimization; Deep learning;

D O I：

10.1007/s12065-023-00897-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For deep model training, an optimization technique is required that minimizes loss and maximizes accuracy. The development of an effective optimization method is one of the most important study areas. The diffGrad optimization method uses gradient changes during optimization phases but does not update 2nd order moments based on 1st order moments, and the AngularGrad optimization method uses the angular value of the gradient, which necessitates additional calculation. Due to these factors, both of those approaches result in zigzag trajectories that take a long time and require additional calculations to attain a global minimum. To overcome those limitations, a novel adaptive deep learning optimization method based on square of first momentum (sqFm) has been proposed. By adjusting 2nd order moments depending on 1st order moments and changing step size according to the present gradient on the non-negative function, the suggested sqFm delivers a smoother trajectory and better image classification accuracy. The empirical research comparing the performance of the proposed sqFm with Adam, diffGrad, and AngularGrad applying non-convex functions demonstrates that the suggested method delivers the best convergence and parameter values. In comparison to SGD, Adam, diffGrad, RAdam, and AngularGrad(tan) using the Rosenbrock function, the proposed sqFm method can attain the global minima gradually with less overshoot. Additionally, it is demonstrated that the proposed sqFm gives consistently good classification accuracy when training CNN networks (ResNet16, ResNet50, VGG34, ResNet18, and DenseNet121) on the CIFAR10, CIFAR100, and MNIST datasets, in contrast to SGDM, diffGrad, Adam, AngularGrad(Cos), and AngularGrad(Tan). The proposed method also gives the best classification accuracy than SGD, Adam, AdaBelief, Yogi, RAdam, and AngularGrad using the ImageNet dataset on the ResNet18 network. Source code link: https://github.com/UtpalNandi/sqFm-A-novel-adaptive-optimization-scheme-for-deep-learning-model.

引用

页码：1 / 1

页数：1

共 55 条

[1] Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
Alzubaidi, Laith
Zhang, Jinglan
Humaidi, Amjad J.
Al-Dujaili, Ayad
Duan, Ye
Al-Shamma, Omran
Santamaria, J.
Fadhel, Mohammed A.
Al-Amidie, Muthana
Farhan, Laith
[J]. JOURNAL OF BIG DATA, 2021, 8 (01)
[2] Natural gradient works efficiently in learning
Amari, S
[J]. NEURAL COMPUTATION, 1998, 10 (02) : 251 - 276
[3] Bantupalli K, 2018, IEEE INT CONF BIG DA, P4896, DOI 10.1109/BigData.2018.8622141
[4] Bhakta Shubhankar, 2023, Machine Intelligence Techniques for Data Analysis and Signal Processing: Proceedings of the 4th International Conference (MISP 2022). Lecture Notes in Electrical Engineering (997), P201, DOI 10.1007/978-981-99-0085-5_17
[5] DiffMoment: an adaptive optimization technique for convolutional neural network
Bhakta, Shubhankar
Nandi, Utpal
Si, Tapas
Ghosal, Sudipta Kr
Changdar, Chiranjit
Pal, Rajat Kumar
[J]. APPLIED INTELLIGENCE, 2023, 53 (13) : 16844 - 16858
[6] Botev A, 2017, IEEE IJCNN, P1899, DOI 10.1109/IJCNN.2017.7966082
[7] Large-Scale Machine Learning with Stochastic Gradient Descent
Bottou, Leon
[J]. COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
[8] SABRINA: A Stochastic Subspace Majorization-Minimization Algorithm
Chouzenoux, Emilie
Fest, Jean-Baptiste
[J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2022, 195 (03) : 919 - 952
[9] Danilova M, 2022, RECENT THEORETICAL A, P79, DOI [DOI 10.1007/978-3-031-00832-0_3, 10.1007/978-3-031-00832-0_3]
[10] Defazio A, 2022, J MACH LEARN RES, V23

← 1 2 3 4 5 6 →