A fast learning algorithm for deep belief nets

被引:12993
作者
Hinton, Geoffrey E. [1 ]
Osindero, Simon
Teh, Yee-Whye
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4, Canada
[2] Natl Univ Singapore, Dept Comp Sci, Singapore 117543, Singapore
关键词
D O I
10.1162/neco.2006.18.7.1527
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.
引用
收藏
页码:1527 / 1554
页数:28
相关论文
共 21 条
[1]  
[Anonymous], [No title captured]
[2]   Shape matching and object recognition using shape contexts [J].
Belongie, S ;
Malik, J ;
Puzicha, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (04) :509-522
[3]  
Carreira-Perpinan M. A., 2005, ARTIF INTELL, P33
[4]   Training invariant support vector machines [J].
Decoste, D ;
Schölkopf, B .
MACHINE LEARNING, 2002, 46 (1-3) :161-190
[5]   BOOSTING A WEAK LEARNING ALGORITHM BY MAJORITY [J].
FREUND, Y .
INFORMATION AND COMPUTATION, 1995, 121 (02) :256-285
[6]   A NESTED PARTITIONING PROCEDURE FOR NUMERICAL MULTIPLE INTEGRATION [J].
FRIEDMAN, JH ;
WRIGHT, MH .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1981, 7 (01) :76-92
[7]   Training products of experts by minimizing contrastive divergence [J].
Hinton, GE .
NEURAL COMPUTATION, 2002, 14 (08) :1771-1800
[8]   THE WAKE-SLEEP ALGORITHM FOR UNSUPERVISED NEURAL NETWORKS [J].
HINTON, GE ;
DAYAN, P ;
FREY, BJ ;
NEAL, RM .
SCIENCE, 1995, 268 (5214) :1158-1161
[9]   Gradient-based learning applied to document recognition [J].
Lecun, Y ;
Bottou, L ;
Bengio, Y ;
Haffner, P .
PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324
[10]   Hierarchical Bayesian inference in the visual cortex [J].
Lee, TS ;
Mumford, D .
JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2003, 20 (07) :1434-1448