Theoretical analysis of skip connections and batch normalization from generalization and optimization perspectives

被引：10

作者：

Furusho, Yasutaka ^{[1
]}

Ikeda, Kazushi ^{[1
]}

机构：

[1] Nara Inst Sci & Technol, Ikoma, Nara 89165, Japan

来源：

APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING | 2020年 / 9卷

基金：

美国国家卫生研究院; 英国惠康基金; 英国自然环境研究理事会;

关键词：

Deep neural networks; ResNet; Skip connections; Batch normalization; DEEP NEURAL-NETWORKS;

D O I：

10.1017/ATSIP.2020.7

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Deep neural networks (DNNs) have the same structure as the neocognitron proposed in 1979 but have much better performance, which is because DNNs include many heuristic techniques such as pre-training, dropout, skip connections, batch normalization (BN), and stochastic depth. However, the reason why these techniques improve the performance is not fully understood. Recently, two tools for theoretical analyses have been proposed. One is to evaluate the generalization gap, defined as the difference between the expected loss and empirical loss, by calculating the algorithmic stability, and the other is to evaluate the convergence rate by calculating the eigenvalues of the Fisher information matrix of DNNs. This overview paper briefly introduces the tools and shows their usefulness by showing why the skip connections and BN improve the performance.

引用

页数：7

共 37 条

[11] [Anonymous], 2019, INNS Big Data and Deep Learning Conference
[12] Learning Deep Architectures for AI
Bengio, Yoshua
[J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01): : 1 - 127
[13] Optimization Methods for Large-Scale Machine Learning
Bottou, Leon
Curtis, Frank E.
Nocedal, Jorge
[J]. SIAM REVIEW, 2018, 60 (02) : 223 - 311
[14] Stability and generalization
Bousquet, O
Elisseeff, A
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (03) : 499 - 526
[15] Charles Zachary, 2018, P MACHINE LEARNING R, V80
[16] Roles of pre-training in deep neural networks from information theoretical perspective
Furusho, Yasutaka
Kubo, Takatomi
Ikeda, Kazushi
[J]. NEUROCOMPUTING, 2017, 248 : 76 - 79
[17] Glorot X., 2010, P 13 INT C ART INT S, P249
[18] Model selection based on minimum description length
Grünwald, P
[J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2000, 44 (01) : 133 - 152
[19] Haykin S., 2002, ADAPTIVE FILTER THEO, V4th ed.
[20] Identity Mappings in Deep Residual Networks
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 630 - 645

← 1 2 3 4 →