Normality-based validation for crisp clustering

被引:24
作者
Lago-Fernandez, Luis F. [1 ]
Corbacho, Fernando [2 ]
机构
[1] Univ Autonoma Madrid, Escuela Poliecn Super, Dept Ingn Informat, E-28049 Madrid, Spain
[2] Cognodata Consulting, Madrid 28010, Spain
关键词
Crisp clustering; Cluster validation; Negentropy; VALIDITY INDEX; MIXTURE; TESTS; NUMBER; NEC;
D O I
10.1016/j.patcog.2009.09.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a new validity index for crisp clustering that is based on the average normality of the clusters. Unlike methods based on inter-cluster and intra-cluster distances, this index emphasizes the cluster shape by using a high order characterization of its probability distribution. The normality of a cluster is characterized by its negentropy, a standard measure of the distance to normality which evaluates the difference between the cluster's entropy and the entropy of a normal distribution with the same covariance matrix. The definition of the negentropy involves the distribution's differential entropy. However, we show that it is possible to avoid its explicit computation by considering only negentropy increments with respect to the initial data distribution, where all the points are assumed to belong to the same cluster. The resulting negentropy increment validity index only requires the computation of covariance matrices. We have applied the new index to an extensive set of artificial and real problems where it provides, in general, better results than other indices, both with respect to the prediction of the correct number of clusters and to the similarity among the real clusters and those inferred. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:782 / 795
页数:14
相关论文
共 54 条
[1]  
[Anonymous], 1988, Metrika, DOI DOI 10.1007/BF02613322
[2]  
[Anonymous], 1992, 9202 J COOK U N QUEE
[3]  
Asuncion A., UCI MACHINE LEARNING
[4]  
Ben-Hur Asa, 2002, Pac Symp Biocomput, P6
[5]   Model order selection for bio-molecular data clustering [J].
Bertoni, Alberto ;
Valentini, Giorgio .
BMC BIOINFORMATICS, 2007, 8 (Suppl 2)
[6]   A geometric approach to cluster validity for normal mixtures [J].
J. C. Bezdek ;
W. Q. Li ;
Y. Attikiouzel ;
M. Windham .
Soft Computing, 1997, 1 (4) :166-179
[7]   Some new indexes of cluster validity [J].
Bezdek, JC ;
Pal, NR .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1998, 28 (03) :301-315
[8]   An improvement of the NEC criterion for assessing the number of clusters in a mixture model [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
PATTERN RECOGNITION LETTERS, 1999, 20 (03) :267-272
[9]   An objective approach to cluster validation [J].
Bouguessa, Mohamed ;
Wang, Shengrui ;
Sun, Haojun .
PATTERN RECOGNITION LETTERS, 2006, 27 (13) :1419-1430
[10]   ADAPTIVE SMOOTHING AND DENSITY-BASED TESTS OF MULTIVARIATE NORMALITY [J].
BOWMAN, AW ;
FOSTER, PJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (422) :529-537