Recognizing handwritten digits using hierarchical products of experts

被引:32
作者
Mayraz, G [1 ]
Hinton, GE [1 ]
机构
[1] UCL, Gatsby Computat Neurosci Unit, London WCIN 3AR, England
关键词
neural networks; products of experts; handwriting recognition; feature extraction; shape recognition; Boltzmann machines; model-based recognition; generative models;
D O I
10.1109/34.982899
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The product of experts learning procedure [1] can discover a set of stochastic binary features that constitute a nonlinear generative model of handwritten images of digits. The quality of generative models learned in this way can be assessed by learning a separate model for each class of digit and then comparing the unnormalized probabilities of test images under the 10 different class-specific models. To improve discriminative performances a hierarchy of separate models can be learned for each digit class. Each model in the hierarchy learns a layer of binary feature detectors that model the probability distribution of vectors of activity of feature detectors in the layer below. The models in the hierarchy are trained sequentially and each model uses a layer of binary feature detectors to learn a generative model of the patterns of feature activities in the preceding layer. After training, each layer of feature dectectors produces a separate, unnormalized log probability score. With three layers of feature detectors for each of the 10 digit classes, a test image produces 30 scores which can be used as inputs to a supervised, logistic classification network that is trained on separate data. On the MNIST database, our system is comparable with current state-of-the-art discriminative methods, demonstrating that the product of experts learning procedure can produce effective hierarchies of generative models of high-dimensional data.
引用
收藏
页码:189 / 197
页数:9
相关论文
共 12 条
  • [1] [Anonymous], 1987, PARALLEL DISTRIBUTED
  • [2] [Anonymous], 1986, PARALLEL DISTRIBUTED
  • [3] [Anonymous], 2000004 GCNU TR
  • [4] Bagging predictors
    Breiman, L
    [J]. MACHINE LEARNING, 1996, 24 (02) : 123 - 140
  • [5] Burges CJC, 1997, ADV NEUR IN, V9, P375
  • [6] FREUND Y, 1992, ADV NEUR IN, V4, P912
  • [7] Bias/variance decompositions for likelihood-based estimators
    Heskes, T
    [J]. NEURAL COMPUTATION, 1998, 10 (06) : 1425 - 1433
  • [8] Adaptive Mixtures of Local Experts
    Jacobs, Robert A.
    Jordan, Michael I.
    Nowlan, Steven J.
    Hinton, Geoffrey E.
    [J]. NEURAL COMPUTATION, 1991, 3 (01) : 79 - 87
  • [9] LECUN Y, 1995, P INT C ART NEUR NET, P53
  • [10] SIMARD P, 1992, P INT C PATT REC