A graph-based estimator of the number of clusters

被引:10
作者
Institut de Mathématiques et de Modélisation de Montpellier, UMR CNRS 5149, Université Montpellier II, CC 051, Place Eugene Bataillon, 34095 Montpellier Cedex 5, France [1 ]
机构
[1] Institut de Mathématiques et de Modélisation de Montpellier, UMR CNRS 5149, Université Montpellier II, 34095 Montpellier Cedex 5, CC 051, Place Eugene Bataillon
来源
ESAIM Prob. Stat. | 2007年 / 272-280期
关键词
Cluster analysis; Connected component; Graph; Level set; Tubular neighborhood;
D O I
10.1051/ps:2007019
中图分类号
学科分类号
摘要
Assessing the number of clusters of a statistical population is one of the essential issues of unsupervised learning. Given n independent observations X1,...,Xn drawn from an unknown multivariate probability density f, we propose a new approach to estimate the number of connected components, or clusters, of the t-level set ℒ(t) = {x : f(x) ≥ t}. The basic idea is to form a rough skeleton of the set ℒ(t) using any preliminary estimator of f, and to count the number of connected components of the resulting graph. Under mild analytic conditions on f, and using tools from differential geometry, we establish the consistency of our method. © EDP Sciences, SMAI 2007.
引用
收藏
页码:272 / 280
页数:8
相关论文
empty
未找到相关数据