Improved Κ-means clustering algorithm for selecting initial clustering centers based on dissimilarity measure

被引:0
作者
Liao J.-Y. [1 ]
Wu S. [1 ]
Liu A.-L. [1 ]
机构
[1] School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming
来源
Kongzhi yu Juece/Control and Decision | 2021年 / 36卷 / 12期
关键词
Clustering analysis; Dissimilarity measure; Initial clustering center; Off-group points; Robustness; Κ-means algorithm;
D O I
10.13195/j.kzyjc.2020.0554
中图分类号
学科分类号
摘要
Selecting a reasonable initial clustering center is the premise of correct clustering. Most of the existing Κ-means clustering algorithm for selecting initial clustering centers based on dissimilarity measure is proposed. According to the dissimilarity of each data object, the dissimilarity matrix is constructed, and two measures of mean dissimilarity and total dissimilarity are defined. Then the initial clustering center is determined according to the criteria, and the median of data points in each cluster is used to replace the mean value for the subsequent iteration of clustering center, so as to eliminate the effect of outliers on clustering accuracy. In addition, the proposed algorithm maintains consistent results every time, and has better robustness in initializing and handling outliers. Finally, experiments are performed on the synthetic datasets and UCI datasets. Compared with three classical clustering algorithms and two improved Κ-means algorithms, the proposed algorithm has better clustering performance. © 2021, Editorial Office of Control and Decision. All right reserved.
引用
收藏
页码:3083 / 3090
页数:7
相关论文
共 22 条
[1]  
Kacprzyk J, Pedrycz W., Springer handbook of computational intelligence, pp. 578-600, (2015)
[2]  
Li X L, Han Q, Qiu B Z., A clustering algorithm using skewness-based boundary detection, Neurocomputing, 275, pp. 618-626, (2018)
[3]  
Chen H Z, Wang W W, Feng X C, Et al., Discriminative and coherent subspace clustering, Neurocomputing, 284, pp. 177-186, (2018)
[4]  
Wu J J, Liu H F, Xiong H, Et al., $K$-means-based consensus clustering: A unified view, IEEE Transactions on Knowledge and Data Engineering, 27, 1, pp. 155-169, (2015)
[5]  
MacQueen J B., Some methods for classification and analysis of multi-variate observations, Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297, (1967)
[6]  
Zhou J, Pan Y Q, Chen C L P, Et al., $K$-medoids method based on divergence for uncertain data clustering, Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics, pp. 2671-2674, (2016)
[7]  
Fery B J, Dueck D., Clustering by passing messages between data points, Science, 315, 5814, pp. 972-976, (2007)
[8]  
Ester M, Kriegel H P, Sander J, Et al., A density-based algorithm for discovering clusters in large spatial databases with noise, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 96, 34, pp. 226-231, (1996)
[9]  
Ankerst M, Breunig M M, Kriegel H P, Et al., OPTICS: Ordering points to identify the clustering structure, Proceedings of the ACM SIGMOD International Conference on Management of Data, 28, 2, pp. 49-60, (1999)
[10]  
Sheikholeslami G, Chatterjee S, Zhang A., Wavecluster: A multi-resolution clustering approach for very large spatial databases, Proceedings of the 24th International Conference on Very Large Data Bases, 98, pp. 428-439, (1998)