AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs

被引:2
作者
Dubey, Shiv Ram [1 ]
Singh, Satish Kumar [1 ]
Chaudhuri, Bidyut Baran [2 ,3 ]
机构
[1] Indian Inst Informat Technol, Comp Vis & Biometr Lab, Allahabad, India
[2] Techno India Univ, Kolkata, India
[3] Indian Stat Inst, Kolkata, India
来源
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年
关键词
D O I
10.1109/WACV56688.2023.00525
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The stochastic gradient descent (SGD) optimizers are generally used to train the convolutional neural networks (CNNs). In recent years, several adaptive momentum based SGD optimizers have been introduced, such as Adam, diffGrad, Radam and AdaBelief. However, the existing SGD optimizers do not exploit the gradient norm of past iterations and lead to poor convergence and performance. In this paper, we propose a novel AdaNorm based SGD optimizers by correcting the norm of gradient in each iteration based on the adaptive training history of gradient norm. By doing so, the proposed optimizers are able to maintain high and representive gradient throughout the training and solves the low and atypical gradient problems. The proposed concept is generic and can be used with any existing SGD optimizer. We show the efficacy of the proposed AdaNorm with four state-of-the-art optimizers, including Adam, diffGrad, Radam and AdaBelief. We depict the performance improvement due to the proposed optimizers using three CNN models, including VGG16, ResNet18 and ResNet50, on three benchmark object recognition datasets, including CIFAR10, CIFAR100 and TinyImageNet.
引用
收藏
页码:5273 / 5282
页数:10
相关论文
共 33 条
[1]  
[Anonymous], Conf. Comput. Vis. (ICCV)
[2]   Large-Scale Machine Learning with Stochastic Gradient Descent [J].
Bottou, Leon .
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186
[3]   ArcFace: Additive Angular Margin Loss for Deep Face Recognition [J].
Deng, Jiankang ;
Guo, Jia ;
Xue, Niannan ;
Zafeiriou, Stefanos .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4685-4694
[4]  
Ding J, 2019, ARXIV191012249
[5]  
Dubey Shiv Ram, 2021, IEEE T CIRCUITS SYST
[6]  
Dubey Shiv Ram, 2019, IEEE T NEURAL NETWOR
[7]  
Duchi J, 2011, J MACH LEARN RES, V12, P2121
[8]  
Glorot X., 2010, International conference on artificial intelligence and statistics, P249
[9]   Recent advances in convolutional neural networks [J].
Gu, Jiuxiang ;
Wang, Zhenhua ;
Kuen, Jason ;
Ma, Lianyang ;
Shahroudy, Amir ;
Shuai, Bing ;
Liu, Ting ;
Wang, Xingxing ;
Wang, Gang ;
Cai, Jianfei ;
Chen, Tsuhan .
PATTERN RECOGNITION, 2018, 77 :354-377
[10]  
He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]