Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization

被引:399
作者
Lafon, Stephane
Lee, Ann B.
机构
[1] Google Inc, Mountain View, CA 94043 USA
[2] Carnegie Mellon Univ, Dept Stat, Pittsburgh, PA 15213 USA
关键词
machine learning; text analysis; knowledge retrieval; quantization; graph-theoretic methods; compression (coding); clustering; clustering similarity measures; information visualization; Markov processes; graph algorithms;
D O I
10.1109/TPAMI.2006.184
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We provide evidence that nonlinear dimensionality reduction, clustering, and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to noise. Our construction, which is based on a Markov random walk on the data, offers a general scheme of simultaneously reorganizing and subsampling graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators. The objective of data partitioning and clustering is to coarse-grain the random walk on the data while at the same time preserving a diffusion operator for the intrinsic geometry or connectivity of the data set up to some accuracy. We show that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification fork-means clustering in diffusion space and a precise measure of the performance of general clustering algorithms.
引用
收藏
页码:1393 / 1403
页数:11
相关论文
共 23 条
[1]  
[Anonymous], 2002, CSE02019 PENNS STAT
[2]   Laplacian eigenmaps for dimensionality reduction and data representation [J].
Belkin, M ;
Niyogi, P .
NEURAL COMPUTATION, 2003, 15 (06) :1373-1396
[3]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[4]  
Chung F, 1997, C BOARD MATH SCI AM
[5]  
COIFMAN R, IN PRESS APPL COMPUT
[6]   Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps [J].
Coifman, RR ;
Lafon, S ;
Lee, AB ;
Maggioni, M ;
Nadler, B ;
Warner, F ;
Zucker, SW .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (21) :7426-7431
[7]   Geometric diffusions as a tool for harmonic analysis and structure definition of data: Multiscale methods [J].
Coifman, RR ;
Lafon, S ;
Lee, AB ;
Maggioni, M ;
Nadler, B ;
Warner, F ;
Zucker, SW .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (21) :7432-7437
[8]  
COIFMAN RR, COMMUNICATION
[9]  
COIFMAN RR, NI PRESS APPL COMPUT
[10]  
Dhillon I. S., 2004, ACM SIGKDD