Dynamical analysis of contrastive divergence learning: Restricted Boltzmann machines with Gaussian visible units

被引:33
作者
Karakida, Ryo [1 ]
Okada, Masato [1 ,2 ]
Amari, Shun-ichi [2 ]
机构
[1] Univ Tokyo, Dept Complex Sci & Engn, 5-1-5 Kashiwanoha, Kashiwa, Chiba 2778561, Japan
[2] RIKEN Brain Sci Inst, 2-1 Hirosawa, Wako, Saitama 3510198, Japan
基金
日本学术振兴会;
关键词
Deep learning; Restricted Boltzmann machine; Contrastive divergence; Component analysis; Stability of learning algorithms; MINOR COMPONENTS; PRINCIPAL;
D O I
10.1016/j.neunet.2016.03.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The restricted Boltzmann machine (RBM) is an essential constituent of deep learning, but it is hard to train by using maximum likelihood (ML) learning, which minimizes the Kullback-Leibler (KL) divergence. Instead, contrastive divergence (CD) learning has been developed as an approximation of ML learning and widely used in practice. To clarify the performance of CD learning, in this paper, we analytically derive the fixed points where ML and CDn learning rules converge in two types of RBMs: one with Gaussian visible and Gaussian hidden units and the other with Gaussian visible and Bernoulli hidden units. In addition, we analyze the stability of the fixed points. As a result, we find that the stable points of CDn learning rule coincide with those of ML learning rule in a Gaussian-Gaussian RBM. We also reveal that larger principal components of the input data are extracted at the stable points. Moreover, in a Gaussian-Bernoulli RBM, we find that both ML and CDn learning can extract independent components at one of stable points. Our analysis demonstrates that the same feature components as those extracted by ML learning are extracted simply by performing CD1 learning. Expanding this study should elucidate the specific solutions obtained by CD learning in other types of RBMs or in deep networks. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:78 / 87
页数:10
相关论文
共 28 条
[1]  
Akaho Shotaro, 2008, Proceedings of the 2008 International Conference on Information Theory and Statistical Learning, P3
[2]   Stability analysis of learning algorithms for blind source separation [J].
Amari, S ;
Chen, TP ;
Cichocki, A .
NEURAL NETWORKS, 1997, 10 (08) :1345-1351
[3]   NEURAL THEORY OF ASSOCIATION AND CONCEPT-FORMATION [J].
AMARI, SI .
BIOLOGICAL CYBERNETICS, 1977, 26 (03) :175-185
[4]  
[Anonymous], 2005, AISTATS BRIDGETOWN B
[5]  
[Anonymous], 2010, Proceedings of the thirteenth international conference on artificial intelligence and statistics
[6]  
[Anonymous], 2008, Advances in neural information processing systems
[7]  
[Anonymous], 2010, 2010003 U TOR
[8]   NEURAL NETWORKS AND PRINCIPAL COMPONENT ANALYSIS - LEARNING FROM EXAMPLES WITHOUT LOCAL MINIMA [J].
BALDI, P ;
HORNIK, K .
NEURAL NETWORKS, 1989, 2 (01) :53-58
[9]   Justifying and Generalizing Contrastive Divergence [J].
Bengio, Yoshua ;
Delalleau, Olivier .
NEURAL COMPUTATION, 2009, 21 (06) :1601-1621
[10]   Unified stabilization approach to principal and minor components extraction algorithms [J].
Chen, TP ;
Amari, S .
NEURAL NETWORKS, 2001, 14 (10) :1377-1387