Fast Correntropy-Based Clustering Algorithm

被引:0
作者
Li Z. [1 ]
Yang B. [1 ]
Zhang J. [1 ]
Liu Y. [1 ]
Zhang X. [1 ]
Wang F. [1 ]
机构
[1] National Engineering Laboratory for Visual Information Processing and Applications, Xi'an Jiaotong University, Xi'an
来源
Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University | 2021年 / 55卷 / 06期
关键词
Anchor graph; Correntropy; Fast clustering;
D O I
10.7652/xjtuxb202106015
中图分类号
学科分类号
摘要
Aiming at the issue of low efficiency and poor robustness in large-scale real-world data clustering, a fast correntropy-based clustering algorithm (FCC) is proposed. FCC is mainly divided into the following two steps: 1) Performing k-means on the original data to obtain rough labels to serve as the label matrix of the second step; 2) Adopting the original data and those anchors to construct the anchor graph and taking Laplacian matrix of the anchor graph as a graph constraint to explore the internal structure of original data so as to obtain the final categories of these samples. Meanwhile, the whole clustering process is carried out under the correntropy instead of the traditional Euclidean distance framework, which can effectively suppress the influence of the large amount of non-linear and non-Gaussian noise in the real-world data on the clustering robustness. To verify the performance of FCC, five state-of-the-art algorithms are developed as baselines to run with FCC on four large-scale real-world data sets. The results show that FCC can improve the clustering accuracy in most cases (by 8.58%, 6.86% and 1.86% respectively on WebKB, TDT2 and Cora) while greatly improving the clustering efficiency (by several or even dozens of times). Furthermore, to verify the robustness of FCC, varying degrees of random noise and Poisson noise are added to WebKB and Cora to obtain 8 noisy data sets, and all algorithms are run on these noisy data sets under the same conditions. Compared with the other baseline algorithms, FCC can maintain the optimal clustering robustness. © 2021, Editorial Office of Journal of Xi'an Jiaotong University. All right reserved.
引用
收藏
页码:121 / 130
页数:9
相关论文
共 40 条
[1]  
LIU Tao, YIN Hongjian, Semi-supervised learning based on K-means clustering algorithm, Application Research of Computers, 27, 3, pp. 913-916, (2010)
[2]  
YU Jinping, ZHENG Jie, MEI Hongbiao, K-means clustering algorithm based on improved artificial bee colony algorithm, Journal of Computer Applications, 34, 4, pp. 1065-1069, (2014)
[3]  
LIN Tao, ZHAO Can, Nearest neighbor optimization k-means clustering algorithm, Computer Science, 46, S2, pp. 216-219, (2019)
[4]  
HU Zhuoya, WENG Jian, Adaptive spectral clustering algorithm based on artificial bee colony algorithm, Journal of Chongqing University of Technology (Natural Science), 34, 3, pp. 137-144, (2020)
[5]  
13
[6]  
BANERJEE A, MERUGU S, DHILLON I, Et al., Clustering with Bregman divergences, Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 1705-1749, (2004)
[7]  
CAMASTRA F, VERRI A., A novel kernel method for clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 5, pp. 801-805, (2005)
[8]  
LI Yeqing, NIE Feiping, HUANG Heng, Et al., Large-scale multi-view spectral clustering via bipartite graph, Proceedings of the 29th National Conference on Artificial Intelligence, pp. 2750-2756, (2015)
[9]  
ZHANG Ruiqi, LU Zhiwu, Large scale sparse clustering, Proceedings of the 25th International Joint Conference on Artificial Intelligence, pp. 2336-2342, (2016)
[10]  
LI Xuelong, CUI Guosheng, DONG Yongsheng, Graph regularized non-negative low-rank matrix factorization for image clustering, IEEE Transactions on Cybernetics, 47, 11, pp. 3840-3853, (2017)