Quality indices for (practical) clustering evaluation

被引:13
作者
Cardoso, Margarida G. M. S. [1 ]
de Carvalho, Andre Ponce de Leon F. [2 ]
机构
[1] ISCTE Business Sch, Dept Quantitat Methods, P-1649026 Lisbon, Portugal
[2] Univ Sao Paulo, Inst Math & Comp Sci, Dept Comp Sci, BR-13560970 Sao Carlos, SP, Brazil
关键词
Cluster validation; validation indices; quality indices; clustering; VALIDATION INDEX; VALIDITY; NUMBER;
D O I
10.3233/IDA-2009-0390
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering quality or validation indices allow the evaluation of the quality of clustering in order to support the selection of a specific partition or clustering structure in its natural unsupervised environment, where the real solution is unknown or not available. In this paper, we investigate the use of quality indices mostly based on the concepts of clusters' compactness and separation, for the evaluation of clustering results (partitions in particular). This work intends to offer a general perspective regarding the appropriate use of quality indices for the purpose of clustering evaluation. After presenting some commonly used indices, as well as indices recently proposed in the literature, key issues regarding the practical use of quality indices are addressed. A general methodological approach is presented which considers the identification of appropriate indices thresholds. This general approach is compared with the simple use of quality indices for evaluating a clustering solution.
引用
收藏
页码:725 / 740
页数:16
相关论文
共 44 条
[21]  
HAKIDIDI M, 2002, SIGMOD RECORD, V31
[22]  
Hartigan J.A., 1975, Clustering Algorithms
[23]   Evolving clusters in gene-expression data [J].
Hruschka, Eduardo R. ;
Campello, Ricardo J. G. B. ;
de Castro, Leandro N. .
INFORMATION SCIENCES, 2006, 176 (13) :1898-1927
[24]   QUADRATIC ASSIGNMENT AS A GENERAL DATA-ANALYSIS STRATEGY [J].
HUBERT, L ;
SCHULTZ, J .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 1976, 29 (NOV) :190-241
[25]  
Jain AK., 1988, ALGORITHMS CLUSTERIN
[26]  
Kaufman L., 1990, Finding Groups in Data, V447, P467, DOI DOI 10.1002/9780470316801
[27]   Fuzzy cluster validation index based on inter-cluster proximity [J].
Kim, DW ;
Lee, KH ;
Lee, D .
PATTERN RECOGNITION LETTERS, 2003, 24 (15) :2561-2574
[28]   Semantic geodesic maps: a unifying geometrical approach for studying the structure and dynamics of single trial evoked responses [J].
Laskaris, NA ;
Ioannides, AA .
CLINICAL NEUROPHYSIOLOGY, 2002, 113 (08) :1209-1226
[29]   AN EXAMINATION OF PROCEDURES FOR DETERMINING THE NUMBER OF CLUSTERS IN A DATA SET [J].
MILLIGAN, GW ;
COOPER, MC .
PSYCHOMETRIKA, 1985, 50 (02) :159-179
[30]  
MILLIGAN GW, 1980, PSYCHOMETRKA, V45