ps-CALR: Periodic-Shift Cosine Annealing Learning Rate for Deep Neural Networks

被引：14

作者：

Johnson, Olanrewaju Victor ^{[1
]}

Xinying, Chew ^{[1
]}

Khaw, Khai Wah ^{[2
]}

Lee, Ming Ha ^{[3
]}

机构：

[1] Univ Sains Malaysia, Sch Comp Sci, Gelugor 11800, Penang, Malaysia

[2] Univ Sains Malaysia, Sch Management, Gelugor 11800, Penang, Malaysia

[3] Swinburne Univ Technol, Sch Engn, Sarawak Campus, Kuching 93350, Malaysia

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Cosine annealing; convergence; flat minima; learning rate; loss function; optimizers;

D O I：

10.1109/ACCESS.2023.3340719

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

There Are Continued Efforts to Build on the Performance of Deep Learning (DL) Models in Various Fields of Application. Developing New DL Models Continues to Open Unprecedented Opportunities in Diverse Application Areas Despite the Enormous Resources Required. Generally, the Learning Mechanism of DL Models Depends on the Term "Cost Function" (CF) or "Loss Function" (LF), and DL Models Require Varied Hyperparameter Settings and, Precisely, Parameters That Can Help the Model to Continually Minimize the Cost Function Until Faster Convergence, With Better Generalization Over the Data in the Loss Landscape, Is Assumed. The Learning Rate (LR) Update Seeks to Find the Optimal Solution for DL Models Through Relative Cost Function Minimization. Therefore, Selecting the Appropriate LR Is Essential to the Performance of DL Models. Despite Its Demonstration for Fast Model Convergence, the Existing Cosine Annealing LR Lacks Complete Loss Landscape Exploration of the Flat Minima, Hence Limiting Its Ability to Model Better Generalization. To Address This, the Paper Proposes a Period-Shift Cosine Annealing Learning Rate With Warm-up Epochs (Ps-CALR) to Perturb the LR Update. Six Publicly Available Datasets Were Used to Benchmark the Proposed LR Method by Experimenting With Custom DL (multilayer Perceptron and Convolutional Neural networks) and Pre-Trained DL Models. The Proposed Ps-CARL Enhances Model Generalization and Convergence, Pushing the Solution to Notably Better Performance Than Fixed LR and the Existing Cosine Annealing Method.

引用

页码：139171 / 139186

页数：16

共 62 条

[1]

Alibrahim H., 2020, IEEE Trans. Emerg. Topics Comput. Intell., V4, P740, DOI [10.1109/TETCI.2020.3020707, DOI 10.1109/TETCI.2020.3020707]

[2] Review of deep learning: concepts, CNN architectures, challenges, applications, future directions [J].

Alzubaidi, Laith ;

Zhang, Jinglan ;

Humaidi, Amjad J. ;

Al-Dujaili, Ayad ;

Duan, Ye ;

Al-Shamma, Omran ;

Santamaria, J. ;

Fadhel, Mohammed A. ;

Al-Amidie, Muthana ;

Farhan, Laith .

JOURNAL OF BIG DATA, 2021, 8 (01)

[3] Unveiling the Structure of Wide Flat Minima in Neural Networks [J].

Baldassi, Carlo ;

Lauditi, Clarissa ;

Malatesta, Enrico M. ;

Perugini, Gabriele ;

Zecchina, Riccardo .

PHYSICAL REVIEW LETTERS, 2021, 127 (27)

[4] Shaping the learning landscape in neural networks around wide flat minima [J].

Baldassi, Carlo ;

Pittorino, Fabrizio ;

Zecchina, Riccardo .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (01) :161-170

[5] Computer vision and deep learning-based data anomaly detection method for structural health monitoring [J].

Bao, Yuequan ;

Tang, Zhiyi ;

Li, Hui ;

Zhang, Yufeng .

STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2019, 18 (02) :401-421

[6] Deep Learning and Its Applications in Biomedicine [J].

Cao, Chensi ;

Liu, Feng ;

Tan, Hai ;

Song, Deshou ;

Shu, Wenjie ;

Li, Weizhong ;

Zhou, Yiming ;

Bo, Xiaochen ;

Xie, Zhi .

GENOMICS PROTEOMICS & BIOINFORMATICS, 2018, 16 (01) :17-32

[7] Performance Analysis of Google Colaboratory as a Tool for Accelerating Deep Learning Applications [J].

Carneiro, Tiago ;

Medeiros Da Nobrega, Raul Victor ;

Nepomuceno, Thiago ;

Bian, Gui-Bin ;

De Albuquerque, Victor Hugo C. ;

Reboucas Filho, Pedro Pedrosa .

IEEE ACCESS, 2018, 6 :61677-61685

[8]

Dinh L, 2017, PR MACH LEARN RES, V70

[9] Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE [J].

Douzas, Georgios ;

Bacao, Fernando .

INFORMATION SCIENCES, 2019, 501 :118-135

[10] Churn prediction using optimized deep learning classifier on huge telecom data [J].

Garimella, Bharathi ;

Prasad, G. V. S. N. R. V. ;

Prasad, M. H. M. Krishna .

JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (3) :2007-2028

← 1 2 3 4 5 6 7 →