A self-organizing principle for learning nonlinear manifolds

被引:73
作者
Agrafiotis, DK [1 ]
Xu, HF [1 ]
机构
[1] 3 Dimens Pharmaceut Inc, Exton, PA 19341 USA
关键词
D O I
10.1073/pnas.242424399
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Modern science confronts us with massive amounts of data: expression profiles of thousands of human genes, multimedia documents, subjective judgments on consumer products or political candidates, trade indices, global climate patterns, etc. These data are often highly structured, but that structure is hidden in a complex set of relationships or high-dimensional abstractions. Here we present a self-organizing algorithm for embedding a set of related observations into a low-dimensional space that preserves the intrinsic dimensionality and metric structure of the data. The embedding is carried out by using an iterative pairwise refinement strategy that attempts to preserve local geometry while maintaining a minimum separation between distant objects. In effect, the method views the proximities between remote objects as lower bounds of their true geodesic distances and uses them as a means to impose global structure. Unlike previous approaches, our method can reveal the underlying geometry of the manifold without intensive nearest-neighbor or shortest-path computations and can reproduce the true geodesic distances of the data points in the low-dimensional embedding without requiring that these distances be estimated from the data sample. More importantly, the method is found to scale linearly with the number of points and can be applied to very large data sets that are intractable by conventional embedding procedures.
引用
收藏
页码:15869 / 15872
页数:4
相关论文
共 15 条
  • [1] Nonlinear mapping networks
    Agrafiotis, DK
    Lobanov, VS
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (06): : 1356 - 1362
  • [2] Combinatorial informatics in the post-genomics era
    Agrafiotis, DK
    Lobanov, VS
    Salemme, FR
    [J]. NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (05) : 337 - 346
  • [3] Borg I., 1997, MODERN MULTIDIMENSIO
  • [4] Crippen G. M., 1988, DISTANCE GEOMETRY MO
  • [5] Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets
    Demartines, P
    Herault, J
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1997, 8 (01): : 148 - 154
  • [6] AN EVALUATION OF THE COMBINED USE OF NUCLEAR MAGNETIC-RESONANCE AND DISTANCE GEOMETRY FOR THE DETERMINATION OF PROTEIN CONFORMATIONS IN SOLUTION
    HAVEL, TF
    WUTHRICH, K
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1985, 182 (02) : 281 - 294
  • [7] HERAULT J, 1999, INT WORK C ART NAT N, V2, P625
  • [8] Kier L.B., 1986, Molecular Connectivity in Structure-Activity Analysis
  • [9] A STOCHASTIC APPROXIMATION METHOD
    ROBBINS, H
    MONRO, S
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1951, 22 (03): : 400 - 407
  • [10] Nonlinear dimensionality reduction by locally linear embedding
    Roweis, ST
    Saul, LK
    [J]. SCIENCE, 2000, 290 (5500) : 2323 - +