Equivalence between dropout and data augmentation: A mathematical check

被引：25

作者：

Zhao, Dazhi ^{[1
,2
]}

Yu, Guozhu ^{[3
]}

Xu, Peng ^{[4
]}

Luo, Maokang ^{[5
]}

机构：

[1] Southwest Petr Univ, Sch Sci, Chengdu 610500, Sichuan, Peoples R China

[2] Southwest Petr Univ, Inst Artificial Intelligence, Chengdu 610500, Sichuan, Peoples R China

[3] Southwest Jiaotong Univ, Sch Math, Chengdu 610031, Sichuan, Peoples R China

[4] Polytech Montreal, Montreal, PQ H3C 3A7, Canada

[5] Sichuan Univ, Sch Math, Chengdu 610065, Sichuan, Peoples R China

来源：

NEURAL NETWORKS | 2019年 / 115卷

基金：

中国国家自然科学基金;

关键词：

Deep learning; Data augmentation; Dropout; Neural network; Mathematical check; DEEP NEURAL-NETWORKS; NOISE;

D O I：

10.1016/j.neunet.2019.03.013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The great achievements of deep learning can be attributed to its tremendous power of feature representation, where the representation ability comes from the nonlinear activation function and the large number of network nodes. However, deep neural networks suffer from serious issues such as slow convergence, and dropout is an outstanding method to improve the network's generalization ability and test performance. Many explanations have been given for why dropout works so well, among which the equivalence between dropout and data augmentation is a newly proposed and stimulating explanation. In this article, we discuss the exact conditions for this equivalence to hold. Our main result guarantees that the equivalence relation almost surely holds if the dimension of the input space is equal to or higher than that of the output space. Furthermore, if the commonly used rectified linear unit activation function is replaced by some newly proposed activation function whose value lies in R, then our results can be extended to multilayer neural networks. For comparison, some counterexamples are given for the inequivalent case. Finally, a series of experiments on the MNIST dataset are conducted to illustrate and help understand the theoretical results. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页码：82 / 89

页数：8

共 30 条

[11] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[12] Hinton G. E., 2012, ar**v preprint ar**v:1207.0580, DOI DOI 10.48550/ARXIV.1207.0580
[13] Deep Neural Networks for Acoustic Modeling in Speech Recognition
Hinton, Geoffrey
Deng, Li
Yu, Dong
Dahl, George E.
Mohamed, Abdel-rahman
Jaitly, Navdeep
Senior, Andrew
Vanhoucke, Vincent
Patrick Nguyen
Sainath, Tara N.
Kingsbury, Brian
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
[14] ImageNet Classification with Deep Convolutional Neural Networks
Krizhevsky, Alex
Sutskever, Ilya
Hinton, Geoffrey E.
[J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
[15] Liang BB, 2016, 2016 16TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), P646, DOI 10.1109/ISCIT.2016.7751713
[16] Vinh NX, 2016, INT C PATT RECOG, P531, DOI 10.1109/ICPR.2016.7899688
[17] Biased Dropout and Crossmap Dropout: Learning towards effective Dropout regularization in convolutional neural network
Poernomo, Alvin
Kang, Dae-Ki
[J]. NEURAL NETWORKS, 2018, 104 : 60 - 67
[18] Rahaman Nasim, 2018, ARXIV180608734
[19] Ramachandran P., 2017, ARXIV
[20] Inverse molecular design using machine learning: Generative models for matter engineering
Sanchez-Lengeling, Benjamin
Aspuru-Guzik, Alan
[J]. SCIENCE, 2018, 361 (6400) : 360 - 365

← 1 2 3 →