Improved spectral clustering algorithm based on similarity measure

被引:10
作者
Yan, Jun [1 ]
Cheng, Debo [2 ]
Zong, Ming [2 ]
Deng, Zhenyun [2 ]
机构
[1] Geographic Center of Guangxi, Nanning, Guangxi
[2] Guangxi Normal University, Guilin, Guangxi
来源
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | 2014年 / 8933卷
关键词
Density sensitive; Gaussian kernel; K-means; Shortest path; Similarity measure; Spectral clustering;
D O I
10.1007/978-3-319-14717-8_50
中图分类号
学科分类号
摘要
Aimed at the Gaussian kernel parameterσ sensitive issue of the traditional spectral clustering algorithm, this paper proposed to utilize the similarity measure based on data density during creating the similarity matrix, inspired by density sensitive similarity measure. Making it increase the distance of the pairs of data in the high density areas, which are located in different spaces. And it can reduce the similarity degree among the pairs of data in the same density region, so as to find the spatial distribution characteristics complex data. According to this point, we designed two similarity measure methods, and both of them didn’t introduce Gaussian kernel function parameterσ . The main difference between the two methods is that the first method introduces a shortest path, while the second method doesn’t. The second method proved to have better comprehensive performance of similarity measure, experimental verification showed that it improved stability of the entire algorithm. In addition to matching spectral clustering algorithm, the final stage of the algorithm is to use the kmeans (or other traditional clustering algorithms) for the selected feature vector to cluster, however the k-means algorithm is sensitive to the initial cluster centers. Therefore, we also designed a simple and effective method to optimize the initial cluster centers leads to improve the k-means algorithm, and applied the improved method to the proposed spectral clustering algorithm. Experimental results on UCI1 datasets show that the improved k-means clustering algorithm can further make cluster more stable. © Springer International Publishing Switzerland 2014.
引用
收藏
页码:641 / 654
页数:13
相关论文
共 33 条
[1]  
Ding C., He X., k-Nearest-Neighbor consistency in data clustering: Incorporating local information into global optimization, ACM Symposium on Applied Computing, pp. 584-589, (2004)
[2]  
Dempster A., Laird N., Rubin D., Maximum likelihood from incomplete data vis the EM algorithm, Journal of Royal Statistical Society Series B, 39, 1, pp. 1-38, (1997)
[3]  
Guha S., Rastogi R., Shim K., CURE: An efficient clustering algorithm for large databases, ACM SIGMOD Record, 27, 2, pp. 73-84, (1998)
[4]  
Gelbard R., Goldman O., Spiegler I., Investigating diversity of clustering methods: An empirical comparison, Data & Knowledge Engineering, pp. 155-156, (2007)
[5]  
Huang Z., Extensions to the k-means algorithm for clustering large datasets with categorical values, Data Mining and Knowledge Discovery, 2, pp. 283-304, (1998)
[6]  
Jain A., Data clustering: 50 years beyond k-means, ICPR, pp. 651-666, (2010)
[7]  
Michael K., Joyce C., Clustering categorical data sets using tabu search techniques, Pattern Recognition, 35, pp. 2783-2790, (2002)
[8]  
Queen J.M., Some methods for classification and analysis of multivariate observations, Proceedings of the Fifth Berkley Symposium Math. Stat. Prob, 1, pp. 281-297, (1967)
[9]  
Qin Y., Zhang S., Zhu X., Zhang J., Zhang C., Semi-parametric optimization for missing data imputation, Appl. Intell, 27, 1, pp. 79-88, (2007)
[10]  
Sun Y., Zhu Q., Chen Z., An iterative initial-points refinement algorithm for categorical data clustering, Pattern Recognition Letters, 23, pp. 875-884, (2002)