Equivalence between dropout and data augmentation: A mathematical check

被引:25
作者
Zhao, Dazhi [1 ,2 ]
Yu, Guozhu [3 ]
Xu, Peng [4 ]
Luo, Maokang [5 ]
机构
[1] Southwest Petr Univ, Sch Sci, Chengdu 610500, Sichuan, Peoples R China
[2] Southwest Petr Univ, Inst Artificial Intelligence, Chengdu 610500, Sichuan, Peoples R China
[3] Southwest Jiaotong Univ, Sch Math, Chengdu 610031, Sichuan, Peoples R China
[4] Polytech Montreal, Montreal, PQ H3C 3A7, Canada
[5] Sichuan Univ, Sch Math, Chengdu 610065, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep learning; Data augmentation; Dropout; Neural network; Mathematical check; DEEP NEURAL-NETWORKS; NOISE;
D O I
10.1016/j.neunet.2019.03.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The great achievements of deep learning can be attributed to its tremendous power of feature representation, where the representation ability comes from the nonlinear activation function and the large number of network nodes. However, deep neural networks suffer from serious issues such as slow convergence, and dropout is an outstanding method to improve the network's generalization ability and test performance. Many explanations have been given for why dropout works so well, among which the equivalence between dropout and data augmentation is a newly proposed and stimulating explanation. In this article, we discuss the exact conditions for this equivalence to hold. Our main result guarantees that the equivalence relation almost surely holds if the dimension of the input space is equal to or higher than that of the output space. Furthermore, if the commonly used rectified linear unit activation function is replaced by some newly proposed activation function whose value lies in R, then our results can be extended to multilayer neural networks. For comparison, some counterexamples are given for the inequivalent case. Finally, a series of experiments on the MNIST dataset are conducted to illustrate and help understand the theoretical results. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:82 / 89
页数:8
相关论文
共 30 条
  • [1] [Anonymous], 2013, Advances in Neural Information Processing Systems, DOI DOI 10.48550/ARXIV.1307.1493
  • [2] [Anonymous], 2015, ARXIV150608700
  • [3] Ba J., 2013, ADV NEURAL INFORM PR, V26
  • [4] TRAINING WITH NOISE IS EQUIVALENT TO TIKHONOV REGULARIZATION
    BISHOP, CM
    [J]. NEURAL COMPUTATION, 1995, 7 (01) : 108 - 116
  • [5] A review on neural networks with random weights
    Cao, Weipeng
    Wang, Xizhao
    Ming, Zhong
    Gao, Jinzhu
    [J]. NEUROCOMPUTING, 2018, 275 : 278 - 287
  • [6] Collobert R, 2011, J MACH LEARN RES, V12, P2493
  • [7] Deep learning of aftershock patterns following large earthquakes
    DeVries, Phoebe M. R.
    Viegas, Fernanda
    Wattenberg, Martin
    Meade, Brendan J.
    [J]. NATURE, 2018, 560 (7720) : 632 - +
  • [8] The rank of a random matrix
    Feng, Xinlong
    Zhang, Zhinan
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2007, 185 (01) : 689 - 694
  • [9] Gal Y, 2016, PR MACH LEARN RES, V48
  • [10] Deep Learning for real-time gravitational wave detection and parameter estimation: Results with Advanced LIGO data
    George, Daniel
    Huerta, E. A.
    [J]. PHYSICS LETTERS B, 2018, 778 : 64 - 70