The topographic organization and visualization of binary data using multivariate-bernoulli latent variable models

被引:14
作者
Girolami, M [1 ]
机构
[1] Univ Paisley, Div Comp & Informat Syst, Appl Computat Intelligence Res Unit, Paisley PA1 2BE, Renfrew, Scotland
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2001年 / 12卷 / 06期
关键词
data clustering; data mining; data visualization; generative modeling; probabilistic modeling; self-organization; text document processing; unsupervised learning;
D O I
10.1109/72.963773
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A nonlinear latent variable model for the topographic organization and subsequent visualization of multivariate binary data is presented. The generative topographic mapping (GTM) is a nonlinear factor analysis model for continuous data which assumes an isotropic Gaussian noise model and performs uniform sampling from a two-dimensional (2-D) latent space. Despite the success of the GTM when applied to continuous data the development of a similar model for discrete binary data has been hindered due, in part, to the nonlinear link function inherent in the binomial distribution which yields a log-likelihood that is nonlinear in the model parameters. This paper presents an effective method for the parameter estimation of a binary latent variable model-a binary version of the GTM-by adopting a variational approximation to the binomial likelihood. This approximation thus provides a log-likelihood which is quadratic in the model parameters and so obviates the necessity of an iterative M-step in the expectation maximization (EM) algorithm. The power of this method is demonstrated on two significant application domains, handwritten digit recognition and the topographic organization of semantically similar text-based documents.
引用
收藏
页码:1367 / 1374
页数:8
相关论文
共 24 条
  • [1] Agresti A., 1990, Analysis of categorical data
  • [2] Bishop C. M., 1995, NEURAL NETWORKS PATT
  • [3] Developments of the generative topographic mapping
    Bishop, CM
    Svensén, M
    Williams, CKI
    [J]. NEUROCOMPUTING, 1998, 21 (1-3) : 203 - 224
  • [4] GTM: The generative topographic mapping
    Bishop, CM
    Svensen, M
    Williams, CKI
    [J]. NEURAL COMPUTATION, 1998, 10 (01) : 215 - 234
  • [5] A hierarchical latent variable model for data visualization
    Bishop, CM
    Tipping, ME
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (03) : 281 - 293
  • [6] Cristianini N, 2000, Intelligent Data Analysis: An Introduction
  • [7] DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
  • [8] 2-9
  • [9] A common neural-network model for unsupervised exploratory data analysis and independent component analysis
    Girolami, M
    Cichocki, A
    Amari, S
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1998, 9 (06): : 1495 - 1501
  • [10] HINTON GE, 1992, ADV NEUR IN, V4, P512