Agglomerative fuzzy K-Means clustering algorithm with selection of number of clusters

被引:190
作者
Li, Mark Junjie [1 ]
Ng, Michael K. [1 ,2 ]
Cheung, Yiu-ming [3 ]
Huang, Joshua Zhexue [4 ]
机构
[1] Hong Kong Baptist Univ, Dept Math, Kowloon Tong, Hong Kong, Peoples R China
[2] Hong Kong Baptist Univ, Ctr Math Imaging & Vis, Kowloon Tong, Hong Kong, Peoples R China
[3] Univ Hong Kong, Dept Comp Sci, Kowloon Tong, Hong Kong, Peoples R China
[4] Univ Hong Kong, E Business Technol Inst, Hong Kong, Hong Kong, Peoples R China
关键词
fuzzy K-Means clustering; agglomerative; number of clusters; cluster validation;
D O I
10.1109/TKDE.2008.88
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present an agglomerative fuzzy K-Means clustering algorithm for numerical data, an extension to the standard fuzzy K-Means algorithm by introducing a penalty term to the objective function to make the clustering process not sensitive to the initial cluster centers. The new algorithm can produce more consistent clustering results from different sets of initial clusters centers. Combined with cluster validation techniques, the new algorithm can determine the number of clusters in a data set, which is a well-known problem in K-Means clustering. Experimental results on synthetic data sets (2 to 5 dimensions, 500 to 5,000 objects and 3 to 7 clusters), the BIRCH two-dimensional data set of 20,000 objects and 100 cluster0and the WINE data set of 178 objects, 17 dimensions, and 3 clusters from UCI have demonstrated the effectiveness of the new algorithm in producing consistent clustering results and determining the correct number of clusters in different data sets, some with overlapping inherent clusters.
引用
收藏
页码:1519 / 1534
页数:16
相关论文
共 39 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
Anderberg M. R., 1973, CLUSTER ANAL APPL, DOI [10.1016/C2013-0-06161-0, DOI 10.1016/C2013-0-06161-0]
[3]  
[Anonymous], DATA ANAL IMAGE RECO
[4]  
Arthur D., 2007, P 18 ANN ACMS S, P1027
[5]   A NEAR-OPTIMAL INITIAL SEED VALUE SELECTION IN K-MEANS ALGORITHM USING A GENETIC ALGORITHM [J].
BABU, GP ;
MURTY, MN .
PATTERN RECOGNITION LETTERS, 1993, 14 (10) :763-769
[6]   A CLUSTERING TECHNIQUE FOR SUMMARIZING MULTIVARIATE DATA [J].
BALL, GH ;
HALL, DJ .
BEHAVIORAL SCIENCE, 1967, 12 (02) :153-&
[8]  
Bozdogan H., 1993, Information and classification: Concepts, methods and applications, P40, DOI DOI 10.1007/978-3-642-50974-2_5
[9]   Maximum weighted likelihood via rival penalized EM for density mixture clustering with automatic model selection [J].
Cheung, YM .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (06) :750-761
[10]   Clustering by competitive agglomeration [J].
Frigui, H ;
Krishnapuram, R .
PATTERN RECOGNITION, 1997, 30 (07) :1109-1119