Robust large-scale clustering based on correntropy

被引:1
作者
Jin, Guodong [1 ]
Gao, Jing [1 ]
Tan, Lining [1 ]
机构
[1] Rocket Force Engn Univ, Xian, Shannxi, Peoples R China
关键词
D O I
10.1371/journal.pone.0277012
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
With the explosive growth of data, how to efficiently cluster large-scale unlabeled data has become an important issue that needs to be solved urgently. Especially in the face of large-scale real-world data, which contains a large number of complex distributions of noises and outliers, the research on robust large-scale real-world data clustering algorithms has become one of the hottest topics. In response to this issue, a robust large-scale clustering algorithm based on correntropy (RLSCC) is proposed in this paper, specifically, k-means is firstly applied to generated pseudo-labels which reduce input data scale of subsequent spectral clustering, then anchor graphs instead of full sample graphs are introduced into spectral clustering to obtain final clustering results based on pseudo-labels which further improve the efficiency. Therefore, RLSCC inherits the advantages of the effectiveness of k-means and spectral clustering while greatly reducing the computational complexity. Furthermore, correntropy is developed to suppress the influence of noises and outlier the real-world data on the robustness of clustering. Finally, extensive experiments were carried out on real-world datasets and noise datasets and the results show that compared with other state-of-the-art algorithms, RLSCC can improve efficiency and robustness greatly while maintaining comparable or even higher clustering effectiveness.
引用
收藏
页数:17
相关论文
共 59 条
[1]  
[Anonymous], 2010, P 27 INT C MACH LEAR
[2]   Word sense disambiguation with pictures [J].
Barnard, K ;
Johnson, M .
ARTIFICIAL INTELLIGENCE, 2005, 167 (1-2) :13-30
[3]   Matching words and pictures [J].
Barnard, K ;
Duygulu, P ;
Forsyth, D ;
de Freitas, N ;
Blei, DM ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1107-1135
[4]  
Fiscus J., 1999, P 1999 DARPA BROADCA, P19
[5]  
Fred ALN, 2003, PROC CVPR IEEE, P128
[6]   CONSTRAINED RESTORATION AND THE RECOVERY OF DISCONTINUITIES [J].
GEMAN, D ;
REYNOLDS, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1992, 14 (03) :367-383
[7]  
Gong Yihong, 2004, P 27 ANN INT ACM SIG, P202, DOI DOI 10.1145/1008992.1009029
[8]   Robust and Discriminative Concept Factorization for Image Representation [J].
Guo, Yuchen ;
Ding, Guiguang ;
Zhou, Jile ;
Liu, Qiang .
ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, :115-122
[9]  
Han JW, 2017, AAAI CONF ARTIF INTE, P1969
[10]   Study of permafrost distribution in Sikkim Himalayas using Sentinel-2 satellite images and logistic regression modelling [J].
Haq, M. Anul ;
Baral, Prashant .
GEOMORPHOLOGY, 2019, 333 :123-136