Selection of K in K-means clustering

被引:367
作者
Pham, DT [1 ]
Dimov, SS [1 ]
Nguyen, CD [1 ]
机构
[1] Cardiff Univ, Mfg Engn Ctr, Cardiff CF24 OYF, Wales
关键词
clustering; K-means algorithm; cluster number selection;
D O I
10.1243/095440605X8298
中图分类号
TH [机械、仪表工业];
学科分类号
0802 ;
摘要
The K-means algorithm is a popular data-clustering algorithm. However, one of its drawbacks is the requirement for the number of clusters, K, to be specified before the algorithm is applied. This paper first reviews existing methods for selecting the number of clusters for the algorithm. Factors that affect this selection are then discussed and a new measure to assist the selection is proposed. The paper concludes with an analysis of the results of using the proposed measure to determine the number of clusters for the K-means algorithm for different data sets.
引用
收藏
页码:103 / 119
页数:17
相关论文
共 31 条
[11]  
Bradley P. S., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P91
[12]  
CAI Z, 2001, THESIS CARDIFF U CAR
[13]  
CASTRO VE, 2000, P 4 EUR WORKSH PRINC, P208
[14]   Numerical studies of MacQueen's k-means algorithm for computing the centroidal Voronoi tessellations [J].
Du, Q ;
Wong, TW .
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2002, 44 (3-4) :511-523
[15]   The LBG-U method for vector quantization - An improvement over LEG inspired from neural networks [J].
Fritzke, B .
NEURAL PROCESSING LETTERS, 1997, 5 (01) :35-45
[16]  
HALKIDI M, 2002, CLUSTER VALIDITY M 1, V31
[17]  
Hamerly G., 2002, Proceedings of the Eleventh International Conference on Information and Knowledge Management. CIKM 2002, P600, DOI 10.1145/584792.584890
[18]  
Han J., 2006, Data Mining: Concepts and Techniques, V340, P93205
[19]  
Hansen LE, 1996, IEEE IJCNN, P25, DOI 10.1109/ICNN.1996.548861
[20]   On the number of clusters [J].
Hardy, A .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1996, 23 (01) :83-96