Dissolution point and isolation robustness: Robustness criteria for general cluster analysis methods

被引:115
作者
Hennig, Christian [1 ]
机构
[1] UCL, Dept Stat Sci, London WC1E 6BT, England
关键词
breakdown point; model-based cluster analysis; mixture model; trimmed k-means; average silhouette width; hierarchical cluster analysis;
D O I
10.1016/j.jmva.2007.07.002
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Two robustness criteria are presented that are applicable to general clustering methods. Robustness and stability in cluster analysis are not only data dependent, but even cluster dependent. Robustness is in the present paper defined as a property of not only the clustering method, but also of every individual cluster in a data set. The main principles are: (a) dissimilarity measurement of an original cluster with the most similar cluster in the induced clustering obtained by adding data points, (b) the dissolution point, which is an adaptation of the breakdown point concept to single clusters, (c) isolation robustness: given a clustering method, is it possible to join, by addition of g points, arbitrarily well separated clusters? Results are derived for k-means, k-medoids (k estimated by average silhouette width), trimmed k-means, mixture models (with and without noise component, with and without estimation of the number of clusters by BIC), single and complete linkage. (C) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:1154 / 1176
页数:23
相关论文
共 40 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[3]  
Bock HH, 1974, AUTOMATISCHE KLASSIF
[4]   An entropy criterion for assessing the number of clusters in a mixture model [J].
Celeux, G ;
Soromenho, G .
JOURNAL OF CLASSIFICATION, 1996, 13 (02) :195-212
[5]   SPACE-CONTRACTING, SPACE-DILATING, AND POSITIVE ADMISSIBLE CLUSTERING ALGORITHMS [J].
CHEN, ZM ;
VANNESS, JW .
PATTERN RECOGNITION, 1994, 27 (06) :853-857
[6]  
Cuesta-Albertos JA, 1997, ANN STAT, V25, P553
[7]   Breakdown and groups - Rejoinder [J].
Davies, PL ;
Gather, U .
ANNALS OF STATISTICS, 2005, 33 (03) :1016-1035
[8]  
Donoho DL., 1983, A festschrift for Erich L. Lehmann, P157
[9]  
FISHER L, 1971, BIOMETRIKA, V58, P91, DOI 10.2307/2334320
[10]   How many clusters? Which clustering method? Answers via model-based cluster analysis [J].
Fraley, C ;
Raftery, AE .
COMPUTER JOURNAL, 1998, 41 (08) :578-588