An examination of indexes for determining the number of clusters in binary data sets

被引:191
作者
Dimitriadou, E
Dolnicar, S
Weingessel, A
机构
[1] Vienna Tech Univ, Inst Stat & Wahrscheinlichkeitstheorie, A-1040 Vienna, Austria
[2] Wirtschaftsuniv Wien, Inst Tourisms & Freizeitwirtschaft, A-1040 Vienna, Austria
关键词
number of clusters; clustering indexes; binary data; artificial data sets; market segmentation;
D O I
10.1007/BF02294713
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The problem of choosing the correct number of clusters is as old as cluster analysis itself. A number of authors have suggested various indexes to facilitate this crucial decision. One of the most extensive comparative studies of indexes was conducted by Milligan and Cooper (1985). The present piece of work pursues the same goal under different conditions. In contrast to Milligan and Cooper's work, the emphasis here is on high-dimensional empirical binary data. Binary artificial data sets are constructed to reflect features typically encountered in real-world data situations in the field of marketing research. The simulation includes 162 binary data sets that are clustered by two different algorithms and lead to recommendations on the number of clusters for each index under consideration. Index results are evaluated and their performance is compared and analyzed.
引用
收藏
页码:137 / 159
页数:23
相关论文
共 41 条
[31]   AN EXAMINATION OF THE EFFECT OF 6 TYPES OF ERROR PERTURBATION ON 15 CLUSTERING ALGORITHMS [J].
MILLIGAN, GW .
PSYCHOMETRIKA, 1980, 45 (03) :325-342
[32]   A MONTE-CARLO STUDY OF 30 INTERNAL CRITERION MEASURES FOR CLUSTER-ANALYSIS [J].
MILLIGAN, GW .
PSYCHOMETRIKA, 1981, 46 (02) :187-199
[33]   AN EXAMINATION OF PROCEDURES FOR DETERMINING THE NUMBER OF CLUSTERS IN A DATA SET [J].
MILLIGAN, GW ;
COOPER, MC .
PSYCHOMETRIKA, 1985, 50 (02) :159-179
[34]   AN AGGLOMERATIVE METHOD FOR CLASSIFICATION OF PLANT COMMUNITIES [J].
ORLOCI, L .
JOURNAL OF ECOLOGY, 1967, 55 (01) :193-&
[35]  
Ratkowsky D. A., 1978, Australian Computer Journal, V10, P115
[36]  
Rost J., 1996, TESTTHEORIE TESTKONS
[37]   ESTIMATING DIMENSION OF A MODEL [J].
SCHWARZ, G .
ANNALS OF STATISTICS, 1978, 6 (02) :461-464
[38]   CLUSTERING METHODS BASED ON LIKELIHOOD RATIO CRITERIA [J].
SCOTT, AJ ;
SYMONS, MJ .
BIOMETRICS, 1971, 27 (02) :387-&
[39]  
WEDEL M, 1998, MARKETING SEGMENTATI, P89
[40]   PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS [J].
WOLFE, JH .
MULTIVARIATE BEHAVIORAL RESEARCH, 1970, 5 (03) :329-350