LocalDrop: A Hybrid Regularization for Deep Neural Networks

被引：11

作者：

Lu, Ziqing ^{[1
,2
]}

Xu, Chang ^{[3
]}

Du, Bo ^{[1
,2
]}

Ishida, Takashi ^{[4
,5
]}

Zhang, Lefei ^{[1
,2
]}

Sugiyama, Masashi ^{[4
,5
]}

机构：

[1] Wuhan Univ, Sch Comp Sci, Natl Engn Res Ctr Multimedia Software, Inst Artificial Intelligence, Wuhan 430079, Peoples R China

[2] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan 430079, Peoples R China

[3] Univ Sydney, Sch Comp Sci, Fac Engn, Darlington, NSW 2008, Australia

[4] Univ Tokyo, RIKEN, Ctr Adv Intelligence Project, Tokyo 1138654, Japan

[5] Univ Tokyo, Dept Complex Sci & Engn, Grad Sch Frontier Sci, Tokyo 1138654, Japan

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Complexity theory; Biological neural networks; Bayes methods; Training; Deep learning; Upper bound; Random variables; Deep neural networks; dropout; dropblock; regularization; DROPOUT;

D O I：

10.1109/TPAMI.2021.3061463

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In neural networks, developing regularization algorithms to settle overfitting is one of the major study areas. We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs), including drop rates and weight matrices, has been developed based on the proposed upper bound of the local Rademacher complexity by the strict mathematical deduction. The analyses of dropout in FCNs and DropBlock in CNNs with keep rate matrices in different layers are also included in the complexity analyses. With the new regularization function, we establish a two-stage procedure to obtain the optimal keep rate matrix and weight matrix to realize the whole training model. Extensive experiments have been conducted to demonstrate the effectiveness of LocalDrop in different models by comparing it with several algorithms and the effects of different hyperparameters on the final performances.

引用

页码：3590 / 3601

页数：12

共 45 条

[1] Information Dropout: Learning Optimal Representations Through Noisy Computation
Achille, Alessandro
Soatto, Stefano
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (12) : 2897 - 2905
[2] [Anonymous], 2013, P NEURIPS
[3] Arora S, 2019, PR MACH LEARN RES, V97
[4] Arora S, 2018, PR MACH LEARN RES, V80
[5] Bartlett P. L., 2003, Journal of Machine Learning Research, V3, P463, DOI 10.1162/153244303321897690
[6] Local Rademacher complexities
Bartlett, PL
Bousquet, O
Mendelson, S
[J]. ANNALS OF STATISTICS, 2005, 33 (04) : 1497 - 1537
[7] Bartlett PL, 2017, 31 ANN C NEURAL INFO, V30
[8] Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
[9] FAST SINGULAR VALUE THRESHOLDING WITHOUT SINGULAR VALUE DECOMPOSITION
Cai, Jian-Feng
Osher, Stanley
[J]. METHODS AND APPLICATIONS OF ANALYSIS, 2013, 20 (04) : 335 - 352
[10] A SINGULAR VALUE THRESHOLDING ALGORITHM FOR MATRIX COMPLETION
Cai, Jian-Feng
Candes, Emmanuel J.
Shen, Zuowei
[J]. SIAM JOURNAL ON OPTIMIZATION, 2010, 20 (04) : 1956 - 1982

← 1 2 3 4 5 →