Selection of K in K-means clustering

被引:367
作者
Pham, DT [1 ]
Dimov, SS [1 ]
Nguyen, CD [1 ]
机构
[1] Cardiff Univ, Mfg Engn Ctr, Cardiff CF24 OYF, Wales
关键词
clustering; K-means algorithm; cluster number selection;
D O I
10.1243/095440605X8298
中图分类号
TH [机械、仪表工业];
学科分类号
0802 ;
摘要
The K-means algorithm is a popular data-clustering algorithm. However, one of its drawbacks is the requirement for the number of clusters, K, to be specified before the algorithm is applied. This paper first reviews existing methods for selecting the number of clusters for the algorithm. Factors that affect this selection are then discussed and a new measure to assist the selection is proposed. The paper concludes with an analysis of the results of using the proposed measure to determine the number of clusters for the K-means algorithm for different data sets.
引用
收藏
页码:103 / 119
页数:17
相关论文
共 31 条
[1]   New methods for the initialisation of clusters [J].
AlDaoud, MB ;
Roberts, SA .
PATTERN RECOGNITION LETTERS, 1996, 17 (05) :451-455
[2]  
ALDAOUD MB, 1995, 9518 U LEEDS SCH COM
[3]  
[Anonymous], 208 STANF U DEP STAT
[4]  
[Anonymous], 1998, PATTERN RECOGNITION
[5]  
[Anonymous], 2002, ACM SIGKDD EXPLOR NE, DOI [10.1145/568574.568575, DOI 10.1145/568574.568575]
[6]  
[Anonymous], ELECT ENG COMPUTER S
[7]  
Bayardo R.J., 1999, P 5 ACM SIGKDD INT C
[8]  
BILMES J, TR97018 INT COMP SCI
[9]  
Blake C.L., 1998, UCI repository of machine learning databases
[10]  
Bottou L., 1995, Advances in Neural Information Processing Systems 7, P585