Dropout Rademacher complexity of deep neural networks

被引:54
作者
Gao, Wei [1 ,2 ]
Zhou, Zhi-Hua [1 ,2 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] Nanjing Univ, Collaborat Innovat Ctr Novel Software Technol & I, Nanjing 210023, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
artificial intelligence; machine learning; deep learning; dropout; Rademacher complexity; BOUNDS;
D O I
10.1007/s11432-015-5470-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Great successes of deep neural networks have been witnessed in various real applications. Many algorithmic and implementation techniques have been developed; however, theoretical understanding of many aspects of deep neural networks is far from clear. A particular interesting issue is the usefulness of dropout, which was motivated from the intuition of preventing complex co-adaptation of feature detectors. In this paper, we study the Rademacher complexity of different types of dropouts, and our theoretical results disclose that for shallow neural networks (with one or none hidden layer) dropout is able to reduce the Rademacher complexity in polynomial, whereas for deep neural networks it can amazingly lead to an exponential reduction.
引用
收藏
页数:12
相关论文
共 35 条
[1]   Asymptotic statistical theory of overtraining and cross-validation [J].
Amari, S ;
Murata, N ;
Muller, KR ;
Finke, M ;
Yang, HH .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1997, 8 (05) :985-996
[2]  
[Anonymous], 2013, Advances in neural information processing systems
[3]  
[Anonymous], 2013, Advances in Neural Information Processing Systems, DOI DOI 10.48550/ARXIV.1307.1493
[4]  
[Anonymous], ADV NEURAL INFORM PR
[5]  
[Anonymous], 2013, ICML
[6]  
Anthony M., 2009, NEURAL NETWORK LEARN
[7]   The dropout learning algorithm [J].
Baldi, Pierre ;
Sadowski, Peter .
ARTIFICIAL INTELLIGENCE, 2014, 210 :78-122
[8]  
Bartlett P. L., 2003, Journal of Machine Learning Research, V3, P463, DOI 10.1162/153244303321897690
[9]   The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network [J].
Bartlett, PL .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (02) :525-536
[10]  
Bo LF, 2011, PROC CVPR IEEE, P1729, DOI 10.1109/CVPR.2011.5995719