sqFm: a novel adaptive optimization scheme for deep learning model

被引:3
作者
Bhakta, Shubhankar [1 ]
Nandi, Utpal [1 ]
Mondal, Madhab [2 ]
Mahapatra, Kuheli Ray [3 ]
Chowdhuri, Partha [1 ]
Pal, Pabitra [4 ]
机构
[1] Vidyasagar Univ, Dept Comp Sci, Paschim Medinipur 721102, West Bengal, India
[2] Mahishadal Girls Coll, Dept Math, Purba Medinipur 721628, West Bengal, India
[3] Bajkul Milani Mahavidyalaya, Dept Comp Sci, Purba Medinipur 721655, West Bengal, India
[4] Maulana Abul Kalam Azad Univ Technol, Dept Comp Applicat, Haringhata 741249, West Bengal, India
关键词
DiffGrad; Adam; AngularGrad; ResNet34; VGG16; ResNet18; DenseNet121; ResNet50; Neural network; Optimization; Deep learning;
D O I
10.1007/s12065-023-00897-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For deep model training, an optimization technique is required that minimizes loss and maximizes accuracy. The development of an effective optimization method is one of the most important study areas. The diffGrad optimization method uses gradient changes during optimization phases but does not update 2nd order moments based on 1st order moments, and the AngularGrad optimization method uses the angular value of the gradient, which necessitates additional calculation. Due to these factors, both of those approaches result in zigzag trajectories that take a long time and require additional calculations to attain a global minimum. To overcome those limitations, a novel adaptive deep learning optimization method based on square of first momentum (sqFm) has been proposed. By adjusting 2nd order moments depending on 1st order moments and changing step size according to the present gradient on the non-negative function, the suggested sqFm delivers a smoother trajectory and better image classification accuracy. The empirical research comparing the performance of the proposed sqFm with Adam, diffGrad, and AngularGrad applying non-convex functions demonstrates that the suggested method delivers the best convergence and parameter values. In comparison to SGD, Adam, diffGrad, RAdam, and AngularGrad(tan) using the Rosenbrock function, the proposed sqFm method can attain the global minima gradually with less overshoot. Additionally, it is demonstrated that the proposed sqFm gives consistently good classification accuracy when training CNN networks (ResNet16, ResNet50, VGG34, ResNet18, and DenseNet121) on the CIFAR10, CIFAR100, and MNIST datasets, in contrast to SGDM, diffGrad, Adam, AngularGrad(Cos), and AngularGrad(Tan). The proposed method also gives the best classification accuracy than SGD, Adam, AdaBelief, Yogi, RAdam, and AngularGrad using the ImageNet dataset on the ResNet18 network. Source code link: https://github.com/UtpalNandi/sqFm-A-novel-adaptive-optimization-scheme-for-deep-learning-model.
引用
收藏
页码:1 / 1
页数:1
相关论文
共 55 条
  • [1] Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
    Alzubaidi, Laith
    Zhang, Jinglan
    Humaidi, Amjad J.
    Al-Dujaili, Ayad
    Duan, Ye
    Al-Shamma, Omran
    Santamaria, J.
    Fadhel, Mohammed A.
    Al-Amidie, Muthana
    Farhan, Laith
    [J]. JOURNAL OF BIG DATA, 2021, 8 (01)
  • [2] Natural gradient works efficiently in learning
    Amari, S
    [J]. NEURAL COMPUTATION, 1998, 10 (02) : 251 - 276
  • [3] Bantupalli K, 2018, IEEE INT CONF BIG DA, P4896, DOI 10.1109/BigData.2018.8622141
  • [4] Bhakta Shubhankar, 2023, Machine Intelligence Techniques for Data Analysis and Signal Processing: Proceedings of the 4th International Conference (MISP 2022). Lecture Notes in Electrical Engineering (997), P201, DOI 10.1007/978-981-99-0085-5_17
  • [5] DiffMoment: an adaptive optimization technique for convolutional neural network
    Bhakta, Shubhankar
    Nandi, Utpal
    Si, Tapas
    Ghosal, Sudipta Kr
    Changdar, Chiranjit
    Pal, Rajat Kumar
    [J]. APPLIED INTELLIGENCE, 2023, 53 (13) : 16844 - 16858
  • [6] Botev A, 2017, IEEE IJCNN, P1899, DOI 10.1109/IJCNN.2017.7966082
  • [7] Large-Scale Machine Learning with Stochastic Gradient Descent
    Bottou, Leon
    [J]. COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
  • [8] SABRINA: A Stochastic Subspace Majorization-Minimization Algorithm
    Chouzenoux, Emilie
    Fest, Jean-Baptiste
    [J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2022, 195 (03) : 919 - 952
  • [9] Danilova M, 2022, RECENT THEORETICAL A, P79, DOI [DOI 10.1007/978-3-031-00832-0_3, 10.1007/978-3-031-00832-0_3]
  • [10] Defazio A, 2022, J MACH LEARN RES, V23