An Improved K-means Clustering Algorithm Based on Normal Matrix

被引:0
作者
Tian Shengwen [1 ]
Zhao Yongsheng [1 ]
Wang Yilei [1 ]
机构
[1] Ludong Univ, Sch Comp Sci & Technol, Yantai 264025, Peoples R China
来源
PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON TEST AUTOMATION AND INSTRUMENTATION, VOL 4 | 2008年
关键词
K-means; Normal matrix; Similarity; Eigenvector;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is well known that K-means algorithm is very sensitive to outliers, and often terminates at a local optimum. Therefore, the quality of the result is not satisfactory. Furthermore, it is usually designed as memory-resident algorithms, which limit the scalability. In this paper, we an improved K-means clustering algorithm - NK-means is developed. NK-means is based on spectral methods, namely uses Normal matrix that is used in spectral analysis approaches to normalize original datasets, and then finds clusters in the processed datasets by K-means algorithm. During the normalization, weighted adjacency matrix is constructed by the similarity between objects belonged in original datasets, and to generate Normal matrix. NK-means algorithm mainly uses the characteristic structure to process original datasets. In the processing, the data is transformed from a matrix into a vector, and the structures of the clusters are changed from three-dimensional sphere into two-dimensional circle. The experiment shows that NK-means algorithm significantly outperforms K-means in the efficiency and accuracy.
引用
收藏
页码:2182 / 2185
页数:4
相关论文
共 5 条
[1]   Detecting communities in large networks [J].
Capocci, A ;
Servedio, VDP ;
Caldarelli, G ;
Colaiori, F .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2005, 352 (2-4) :669-676
[2]  
Han J., 2001, Data Mining Concepts and Techniques
[3]  
MacQueen J., 1967, Proc fifth Berkeley Symp Math Stat Probab, V1, P281
[4]  
Qian Wei-ning, 2002, Journal of Software, V13, P1382
[5]  
WANG X F, 2006, THEORY APPL COMPLEX