A tractable latent variable model for nonlinear dimensionality reduction

被引:7
作者
Saul, Lawrence K. [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
关键词
unsupervised learning; nonlinear dimensionality reduction; MAXIMUM-LIKELIHOOD; ALGORITHMS; EIGENMAPS;
D O I
10.1073/pnas.1916012117
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We propose a latent variable model to discover faithful low-dimensional representations of high-dimensional data. The model computes a low-dimensional embedding that aims to preserve neighborhood relationships encoded by a sparse graph. The model both leverages and extends current leading approaches to this problem. Like t-distributed Stochastic Neighborhood Embed-ding, the model can produce two-and three-dimensional embed-dings for visualization, but it can also learn higher-dimensional embeddings for other uses. Like LargeVis and Uniform Mani-fold Approximation and Projection, the model produces embed-dings by balancing two goals-pulling nearby examples closer together and pushing distant examples further apart. Unlike these approaches, however, the latent variables in our model pro-vide additional structure that can be exploited for learning. We derive an Expectation-Maximization procedure with closed-form updates that monotonically improve the model's likelihood: In this procedure, embeddings are iteratively adapted by solving sparse, diagonally dominant systems of linear equations that arise from a discrete graph Laplacian. For large problems, we also develop an approximate coarse-graining procedure that avoids the need for negative sampling of nonadjacent nodes in the graph. We demonstrate the model's effectiveness on datasets of images and text.
引用
收藏
页码:15403 / 15408
页数:6
相关论文
共 28 条
  • [1] [Anonymous], 2003, Advances in Neural Information Processing System
  • [2] A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS
    BAUM, LE
    PETRIE, T
    SOULES, G
    WEISS, N
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01): : 164 - &
  • [3] Laplacian eigenmaps for dimensionality reduction and data representation
    Belkin, M
    Niyogi, P
    [J]. NEURAL COMPUTATION, 2003, 15 (06) : 1373 - 1396
  • [4] Variational Inference: A Review for Statisticians
    Blei, David M.
    Kucukelbir, Alp
    McAuliffe, Jon D.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) : 859 - 877
  • [5] Carreira-Perpinan MiguelA., 2005, ADV NEURAL INFORM PR, V17, P225
  • [6] Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps
    Coifman, RR
    Lafon, S
    Lee, AB
    Maggioni, M
    Nadler, B
    Warner, F
    Zucker, SW
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (21) : 7426 - 7431
  • [7] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [8] Der M., 2012, Advances in Neural Information Processing Systems, P3230, DOI DOI 10.1109/ICCV.2013.203
  • [9] Identification and characterization of the cysteine protease inhibitor gene MdCPI from Musca domestica
    Dong, X.
    Liu, F.
    Zhang, D.
    Tang, T.
    Ge, X.
    [J]. INSECT MOLECULAR BIOLOGY, 2011, 20 (05) : 577 - 586
  • [10] Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data
    Donoho, DL
    Grimes, C
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (10) : 5591 - 5596