Dropout Rademacher complexity of deep neural networks

被引:0
作者
Wei Gao
Zhi-Hua Zhou
机构
[1] Nanjing University,National Key Laboratory for Novel Software Technology
[2] Nanjing University,Collaborative Innovation Center of Novel Software Technology and Industrialization
来源
Science China Information Sciences | 2016年 / 59卷
关键词
artificial intelligence; machine learning; deep learning; dropout; Rademacher complexity;
D O I
暂无
中图分类号
学科分类号
摘要
Great successes of deep neural networks have been witnessed in various real applications. Many algorithmic and implementation techniques have been developed; however, theoretical understanding of many aspects of deep neural networks is far from clear. A particular interesting issue is the usefulness of dropout, which was motivated from the intuition of preventing complex co-adaptation of feature detectors. In this paper, we study the Rademacher complexity of different types of dropouts, and our theoretical results disclose that for shallow neural networks (with one or none hidden layer) dropout is able to reduce the Rademacher complexity in polynomial, whereas for deep neural networks it can amazingly lead to an exponential reduction.
引用
收藏
相关论文
共 35 条
  • [1] Hinton G E(2006)Reducing the dimensionality of data with neural networks Science 313 504-507
  • [2] Salakhutdinov R R(2010)Deep, big, simple neural nets for handwritten digit recognition Neural Comput 22 3207-3220
  • [3] Cire¸san D C(2012)Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition IEEE Trans Audio, Speech, Language Process 20 30-42
  • [4] Meier U(2012)Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups IEEE Signal Process Mag 29 82-97
  • [5] Gambardella L M(2014)Adaptive neural network tracking design for a class of uncertain nonlinear discrete-time systems with dead-zone Sci China Inf Sci 57 032206-996
  • [6] Dahl G E(1997)Asymptotic statistical theory of overtraining and cross-validation IEEE Trans Neural Netw 8 985-1958
  • [7] Yu D(2014)Dropout: a simple way to prevent neural networks from overfitting J Mach Lear Res 15 1929-122
  • [8] Deng L(2014)The dropout learning algorithm Artif Intel 210 78-176
  • [9] Hinton G E(1997)Polynomial bounds for VC dimension of sigmoidal and general pfaffian neural networks J Comput Syst Sci 54 169-482
  • [10] Deng L(2002)Rademacher and gaussian complexities: risk bounds and structural results J Mach Lear Res 3 463-50