How to Train Your SSAG in Convolutional Network

被引:0
作者
Fang, Kaiyuan [1 ]
Chen, Aixiang [1 ,2 ]
机构
[1] Guangdong Univ Finance & Econ, Sch Stat & Math, Guangzhou, Peoples R China
[2] Guangdong Univ Finance & Econ, Inst Artificial Intelligence & Deep Learning, Guangzhou, Peoples R China
来源
2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC) | 2022年
关键词
Stochastic Stratified Average Gradient; Stratified Sampling; Batch Normalization; Exponentially Weighted Moving Average; Optimization;
D O I
10.1109/IAEAC54830.2022.9929984
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
SSAG(Stochastic Stratified Average Gradient) is an optimization algorithm that achieves linear convergence with smaller storage and iteration costs. However, in the convolutional network, SSAG has two problems. One is that the training is unstable, and it has high requirements on the learning rate, which increases the training difficulty of the model. The other is that Batch Normalization does not work in the SSAG algorithm, making the SSAG algorithm unusable in many models, limiting the scope of SSAG use. For these two problems, we propose SSAGM(Stochastic Stratified Average Gradient with Momentous) and SSBN(Stochastic Stratified Batch Normalization). SSAGM draws on the SGDM (Stochastic Gradient Descent with Momentum) algorithm, and uses exponentially weighted moving average to estimate the first-order moment of the gradient of each category, which makes the model training more stable and converges faster. SSBN draws on SSAG stratified random sampling technology to change the calculation method of mean and variance in Batch Normalization, so that SSAG algorithm can also perform normalization. The experimental results show that SSAGM is better than SSAG and other algorithms, and SSBN can also play a role similar to Batch Normalization in SSAG algorithm.
引用
收藏
页码:789 / 793
页数:5
相关论文
共 21 条
[1]   Large-Scale Machine Learning with Stochastic Gradient Descent [J].
Bottou, Leon .
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186
[2]  
Brown TB, 2020, ADV NEUR IN, V33
[3]  
Cauchy A., 1847, Comp. Rend. Sci. Paris, V25, P536
[4]  
Chen A.A., 2018, 2018 INT JOINT C NEU, P1
[5]  
chen Aixiang., 2020, DEEP LEARNING
[6]   Deep, Big, Simple Neural Nets for Handwritten Digit Recognition [J].
Ciresan, Dan Claudiu ;
Meier, Ueli ;
Gambardella, Luca Maria ;
Schmidhuber, Juergen .
NEURAL COMPUTATION, 2010, 22 (12) :3207-3220
[7]  
Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
[8]  
Ioffe S, 2015, PR MACH LEARN RES, V37, P448
[9]  
Kingma DP, 2014, ADV NEUR IN, V27
[10]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90