Equivalence between dropout and data augmentation: A mathematical check

被引：25

作者：

Zhao, Dazhi ^{[1
,2
]}

Yu, Guozhu ^{[3
]}

Xu, Peng ^{[4
]}

Luo, Maokang ^{[5
]}

机构：

[1] Southwest Petr Univ, Sch Sci, Chengdu 610500, Sichuan, Peoples R China

[2] Southwest Petr Univ, Inst Artificial Intelligence, Chengdu 610500, Sichuan, Peoples R China

[3] Southwest Jiaotong Univ, Sch Math, Chengdu 610031, Sichuan, Peoples R China

[4] Polytech Montreal, Montreal, PQ H3C 3A7, Canada

[5] Sichuan Univ, Sch Math, Chengdu 610065, Sichuan, Peoples R China

来源：

NEURAL NETWORKS | 2019年 / 115卷

基金：

中国国家自然科学基金;

关键词：

Deep learning; Data augmentation; Dropout; Neural network; Mathematical check; DEEP NEURAL-NETWORKS; NOISE;

D O I：

10.1016/j.neunet.2019.03.013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The great achievements of deep learning can be attributed to its tremendous power of feature representation, where the representation ability comes from the nonlinear activation function and the large number of network nodes. However, deep neural networks suffer from serious issues such as slow convergence, and dropout is an outstanding method to improve the network's generalization ability and test performance. Many explanations have been given for why dropout works so well, among which the equivalence between dropout and data augmentation is a newly proposed and stimulating explanation. In this article, we discuss the exact conditions for this equivalence to hold. Our main result guarantees that the equivalence relation almost surely holds if the dimension of the input space is equal to or higher than that of the output space. Furthermore, if the commonly used rectified linear unit activation function is replaced by some newly proposed activation function whose value lies in R, then our results can be extended to multilayer neural networks. For comparison, some counterexamples are given for the inequivalent case. Finally, a series of experiments on the MNIST dataset are conducted to illustrate and help understand the theoretical results. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页码：82 / 89

页数：8

共 30 条

[1] [Anonymous], 2013, Advances in Neural Information Processing Systems, DOI DOI 10.48550/ARXIV.1307.1493
[2] [Anonymous], 2015, ARXIV150608700
[3] Ba J., 2013, ADV NEURAL INFORM PR, V26
[4] TRAINING WITH NOISE IS EQUIVALENT TO TIKHONOV REGULARIZATION
BISHOP, CM
[J]. NEURAL COMPUTATION, 1995, 7 (01) : 108 - 116
[5] A review on neural networks with random weights
Cao, Weipeng
Wang, Xizhao
Ming, Zhong
Gao, Jinzhu
[J]. NEUROCOMPUTING, 2018, 275 : 278 - 287
[6] Collobert R, 2011, J MACH LEARN RES, V12, P2493
[7] Deep learning of aftershock patterns following large earthquakes
DeVries, Phoebe M. R.
Viegas, Fernanda
Wattenberg, Martin
Meade, Brendan J.
[J]. NATURE, 2018, 560 (7720) : 632 - +
[8] The rank of a random matrix
Feng, Xinlong
Zhang, Zhinan
[J]. APPLIED MATHEMATICS AND COMPUTATION, 2007, 185 (01) : 689 - 694
[9] Gal Y, 2016, PR MACH LEARN RES, V48
[10] Deep Learning for real-time gravitational wave detection and parameter estimation: Results with Advanced LIGO data
George, Daniel
Huerta, E. A.
[J]. PHYSICS LETTERS B, 2018, 778 : 64 - 70

← 1 2 3 →