Learning by Unsupervised Nonlinear Diffusion

被引:0
作者
Maggioni, Mauro [1 ]
Murphy, James M. [2 ]
机构
[1] Johns Hopkins Univ, Dept Math, Dept Appl Math & Stat, Math Inst Data Sci,Inst Data Intens Engn & Sci, Baltimore, MD 21218 USA
[2] Tufts Univ, Dept Math, Medford, MA 02155 USA
关键词
unsupervised learning; clustering; spectral graph theory; manifold learning; diffusion geometry; DIMENSIONALITY REDUCTION; CLUSTERING-ALGORITHM; COMPONENT ANALYSIS; NORMALIZED CUTS; LAPLACIAN; VARIABLES; FUSION; MAPS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes and analyzes a novel clustering algorithm, called learning by unsupervised nonlinear diffusion (LUND), that combines graph-based diffusion geometry with techniques based on density and mode estimation. LUND is suitable for data generated from mixtures of distributions with densities that are both multimodal and supported near nonlinear sets. A crucial aspect of this algorithm is the use of time of a data-adapted diffusion process, and associated diffusion distances, as a scale parameter that is different from the local spatial scale parameter used in many clustering algorithms. We prove estimates for the behavior of diffusion distances with respect to this time parameter under a flexible nonparametric data model, identifying a range of times in which the mesoscopic equilibria of the underlying process are revealed, corresponding to a gap between within-cluster and between-cluster diffusion distances. These structures may be missed by the top eigenvectors of the graph Laplacian, commonly used in spectral clustering. This analysis is leveraged to prove sufficient conditions guaranteeing the accuracy of LUND. We implement LUND and confirm its theoretical properties on illustrative data sets, demonstrating its theoretical and empirical advantages over both spectral and density-based clustering.
引用
收藏
页数:56
相关论文
共 101 条
[1]  
Aldous D., 1995, Unfinished Monograph
[2]  
Andersen R, 2006, ANN IEEE SYMP FOUND, P475
[3]   Local Partitioning for Directed Graphs Using PageRank [J].
Andersen, Reid ;
Chung, Fan ;
Lang, Kevin .
INTERNET MATHEMATICS, 2008, 5 (1-2) :3-22
[4]  
Andersen R, 2009, ACM S THEORY COMPUT, P235
[5]  
[Anonymous], 2006, IEEE T NEURAL NETWOR
[6]  
[Anonymous], 1997, AM MATH SOC, DOI DOI 10.1090/CBMS/092
[7]  
[Anonymous], 2007, P INT NEUR INF PROC
[8]  
[Anonymous], 2012, ARXIV12121384
[9]  
[Anonymous], 2009, STOCHASTIC METHODS, DOI DOI 10.1371/JOURNAL.PONE.0006945
[10]  
[Anonymous], 2009, Amer. Math. Soc.