PID controller-based adaptive gradient optimizer for deep neural networks

被引：4

作者：

Dai, Mingjun ^{[1
]}

Zhang, Zelong ^{[1
]}

Lai, Xiong ^{[1
]}

Lin, Xiaohui ^{[1
,3
]}

Wang, Hui ^{[2
]}

机构：

[1] Shenzhen Univ, Coll Elect & Informat Engn, Guangdong Prov Engn Ctr Ubiquitous Comp & Intellig, Dept Commun, Shenzhen 518060, Peoples R China

[2] Shenzhen Inst Informat Technol, Dept Commun, Shenzhen, Peoples R China

[3] Shenzhen Univ, Coll Elect & Informat Engn, Guangdong Prov Engn Ctr Ubiquitous Comp & Intellig, Shenzhen, Peoples R China

来源：

IET CONTROL THEORY AND APPLICATIONS | 2023年 / 17卷 / 15期

基金：

中国国家自然科学基金;

关键词：

Stochastic gradient descent (SGD); adaptive optimizer; Adam; proportional-integral-derivative (PID); feedback control; DESIGN;

D O I：

10.1049/cth2.12404

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Due to improper selection of gradient update direction or learning rate, SGD optimization algorithms for deep learning suffer from oscillation and slow convergence. Although Adam algorithm can adaptively adjust the update direction and learning rate at the same time, it still has the overshoot phenomenon, and hence suffers from wasting computing resources and slow convergence. In this work, the PID controller from the feedback control area is borrowed to re-express the adaptive optimization algorithm (the Adam optimization algorithm is derived into the integral I component form) of deep learning. In order to alleviate the overshoot phenomenon and hence speed up the convergence of Adam, a complete adaptive PID optimizer (adaptive-PID) is proposed by incorporating the proportional P and derivative D component. Extensive experiments on standard data sets verify that the proposed adaptive-PID algorithm significantly outperforms Adam algorithm in terms of convergence rate and accuracy.

引用

页码：2032 / 2037

页数：6

共 32 条

[1] A PID Controller Approach for Stochastic Optimization of Deep Networks
An, Wangpeng
Wang, Haoqian
Sun, Qingyun
Xu, Jun
Dai, Qionghai
Zhang, Lei
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8522 - 8531
[2] [Anonymous], The CIFAR-10 dataset
[3] [Anonymous], MNIST DATASET
[4] [Anonymous], FASH MNIST DAT
[5] Dauphin YN, 2014, ADV NEUR IN, V27
[6] AdaInject: Injection-Based Adaptive Gradient Descent Optimizers for Convolutional Neural Networks
Dubey S.R.
Basha S.H.S.
Singh S.K.
Chaudhuri B.B.
[J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (06): : 1540 - 1548
[7] Duchi J, 2011, J MACH LEARN RES, V12, P2121
[8] He K., 2016, 2016 IEEE C COMP VIS, DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]
[9] Horvoth S., 2022, NATURAL COMPRESSION
[10] Keskar N. S., 2017, On large-batch training for deep learning: Generalization gap and sharp minima

← 1 2 3 4 →