Equivalence between dropout and data augmentation: A mathematical check

被引:25
作者
Zhao, Dazhi [1 ,2 ]
Yu, Guozhu [3 ]
Xu, Peng [4 ]
Luo, Maokang [5 ]
机构
[1] Southwest Petr Univ, Sch Sci, Chengdu 610500, Sichuan, Peoples R China
[2] Southwest Petr Univ, Inst Artificial Intelligence, Chengdu 610500, Sichuan, Peoples R China
[3] Southwest Jiaotong Univ, Sch Math, Chengdu 610031, Sichuan, Peoples R China
[4] Polytech Montreal, Montreal, PQ H3C 3A7, Canada
[5] Sichuan Univ, Sch Math, Chengdu 610065, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep learning; Data augmentation; Dropout; Neural network; Mathematical check; DEEP NEURAL-NETWORKS; NOISE;
D O I
10.1016/j.neunet.2019.03.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The great achievements of deep learning can be attributed to its tremendous power of feature representation, where the representation ability comes from the nonlinear activation function and the large number of network nodes. However, deep neural networks suffer from serious issues such as slow convergence, and dropout is an outstanding method to improve the network's generalization ability and test performance. Many explanations have been given for why dropout works so well, among which the equivalence between dropout and data augmentation is a newly proposed and stimulating explanation. In this article, we discuss the exact conditions for this equivalence to hold. Our main result guarantees that the equivalence relation almost surely holds if the dimension of the input space is equal to or higher than that of the output space. Furthermore, if the commonly used rectified linear unit activation function is replaced by some newly proposed activation function whose value lies in R, then our results can be extended to multilayer neural networks. For comparison, some counterexamples are given for the inequivalent case. Finally, a series of experiments on the MNIST dataset are conducted to illustrate and help understand the theoretical results. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:82 / 89
页数:8
相关论文
共 30 条
  • [11] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [12] Hinton G. E., 2012, ar**v preprint ar**v:1207.0580, DOI DOI 10.48550/ARXIV.1207.0580
  • [13] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
  • [14] ImageNet Classification with Deep Convolutional Neural Networks
    Krizhevsky, Alex
    Sutskever, Ilya
    Hinton, Geoffrey E.
    [J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
  • [15] Liang BB, 2016, 2016 16TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), P646, DOI 10.1109/ISCIT.2016.7751713
  • [16] Vinh NX, 2016, INT C PATT RECOG, P531, DOI 10.1109/ICPR.2016.7899688
  • [17] Biased Dropout and Crossmap Dropout: Learning towards effective Dropout regularization in convolutional neural network
    Poernomo, Alvin
    Kang, Dae-Ki
    [J]. NEURAL NETWORKS, 2018, 104 : 60 - 67
  • [18] Rahaman Nasim, 2018, ARXIV180608734
  • [19] Ramachandran P., 2017, ARXIV
  • [20] Inverse molecular design using machine learning: Generative models for matter engineering
    Sanchez-Lengeling, Benjamin
    Aspuru-Guzik, Alan
    [J]. SCIENCE, 2018, 361 (6400) : 360 - 365