Noise can speed backpropagation learning and deep bidirectional pretraining

被引:26
作者
Kosko, Bart [1 ]
Audhkhasi, Kartik [1 ,3 ]
Osoba, Osonde [1 ,2 ]
机构
[1] Univ Southern Calif, Dept Elect & Comp Engn, Signal & Image Proc Inst, Los Angeles, CA 90089 USA
[2] RAND Corp, Santa Monica, CA 90401 USA
[3] Google Inc, New York, NY USA
关键词
Backpropagation; Noise benefit; Stochastic resonance; Expectation-Maximization algorithm; Bidirectional associative memory; Contrastive divergence; NEURAL-NETWORKS; STOCHASTIC RESONANCE; FUZZY-SYSTEMS; EM; ALGORITHM; MODELS; REPRESENTATIONS; REGULARIZATION;
D O I
10.1016/j.neunet.2020.04.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We show that the backpropagation algorithm is a special case of the generalized Expectation-Maximization (EM) algorithm for iterative maximum likelihood estimation. We then apply the recent result that carefully chosen noise can speed the average convergence of the EM algorithm as it climbs a hill of probability or log-likelihood. Then injecting such noise can speed the average convergence of the backpropagation algorithm for both the training and pretraining of multilayer neural networks. The beneficial noise adds to the hidden and visible neurons and related parameters. The noise also applies to regularized regression networks. This beneficial noise is just that noise that makes the current signal more probable. We show that such noise also tends to improve classification accuracy. The geometry of the noise-benefit region depends on the probability structure of the neurons in a given layer. The noise-benefit region in noise space lies above the noisy-EM (NEM) hyperplane for classification and involves a hypersphere for regression. Simulations demonstrate these noise benefits using MNIST digit classification. The NEM noise benefits substantially exceed those of simply adding blind noise to the neural network. We further prove that the noise speed-up applies to the deep bidirectional pretraining of neural-network bidirectional associative memories (BAMs) or their functionally equivalent restricted Boltzmann machines. We then show that learning with basic contrastive divergence also reduces to generalized EM for an energy-based network probability. The optimal noise adds to the input visible neurons of a BAM in stacked layers of trained BAMs. Global stability of generalized BAMs guarantees rapid convergence in pretraining where neural signals feed back between contiguous layers. Bipolar coding of inputs further improves pretraining performance. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页码:359 / 384
页数:26
相关论文
共 108 条
[1]  
Adigun O., 2016, INT JOINT C ADV BIG, P3
[2]   Bidirectional Backpropagation [J].
Adigun, Olaoluwa ;
Kosko, Bart .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (05) :1982-1994
[3]   Noise-boosted bidirectional backpropagation and adversarial learning [J].
Adigun, Olaoluwa ;
Kosko, Bart .
NEURAL NETWORKS, 2019, 120 :9-31
[4]   Training Generative Adversarial Networks with Bidirectional Backpropagation [J].
Adigun, Olaoluwa ;
Kosko, Bart .
2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, :1178-1185
[5]  
Adigun O, 2017, IEEE IJCNN, P108, DOI 10.1109/IJCNN.2017.7965843
[6]   Information geometry of the EM and em algorithms for neural networks [J].
Amari, SI .
NEURAL NETWORKS, 1995, 8 (09) :1379-1408
[7]   The effects of adding noise during backpropagation training on a generalization performance [J].
An, GZ .
NEURAL COMPUTATION, 1996, 8 (03) :643-674
[8]  
[Anonymous], 2009, NIPS
[9]  
[Anonymous], NEURAL NETWORKS
[10]  
[Anonymous], 1998, Neural Networks: A Comprehensive Foundation