Quasi-cluster centers clustering algorithm based on potential entropy and t-distributed stochastic neighbor embedding

被引:6
作者
Fang, Xian [1 ]
Tie, Zhixin [1 ]
Guan, Yinan [1 ]
Rao, Shanshan [1 ]
机构
[1] Zhejiang Sci Tech Univ, Sch Informat Sci & Technol, Hangzhou, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Data clustering; Quasi-cluster centers clustering; Potential entropy; Optimal parameter; t-distributed stochastic neighbor embedding; DENSITY PEAKS; FAST SEARCH; FIND; REDUCTION; ROCK;
D O I
10.1007/s00500-018-3221-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A novel density-based clustering algorithm named QCC is presented recently. Although the algorithm has proved its strong robustness, it is still necessary to manually determine the two input parameters, including the number of neighbors (k) and the similarity threshold value (), which severely limits the promotion of the algorithm. In addition, the QCC does not perform excellently when confronting the datasets with relatively high dimensions. To overcome these defects, firstly, we define a new method for computing local density and introduce the strategy of potential entropy into the original algorithm. Based on this idea, we propose a new QCC clustering algorithm (QCC-PE). QCC-PE can automatically extract optimal value of the parameter k by optimizing potential entropy of data field. By this means, the optimized parameter can be calculated from the datasets objectively rather than the empirical estimation accumulated from a large number of experiments. Then, t-distributed stochastic neighbor embedding (tSNE) is applied to the model of QCC-PE and further brings forward a method based on tSNE (QCC-PE-tSNE), which preprocesses high-dimensional datasets by dimensionality reduction technique. We compare the performance of the proposed algorithms with QCC, DBSCAN, and DP in the synthetic datasets, Olivetti Face Database, and real-world datasets respectively. Experimental results show that our algorithms are feasible and effective and can often outperform the comparisons.
引用
收藏
页码:5645 / 5657
页数:13
相关论文
共 42 条
[21]   A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method [J].
Kumar, K. Mahesh ;
Reddy, A. Rama Mohan .
PATTERN RECOGNITION, 2016, 58 :39-48
[22]   An automatic fuzzy c-means algorithm for image segmentation [J].
Li, Yan-ling ;
Shen, Yi .
SOFT COMPUTING, 2010, 14 (02) :123-128
[23]   An adaptive spatial fuzzy clustering algorithm for 3-D MR image segmentation [J].
Liew, AWC ;
Yan, H .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2003, 22 (09) :1063-1075
[24]  
MacQueen J., 1967, PROC 5 BERKELEY S MA, V1, P281
[25]  
Madan S., 2015, PATTERN ANAL APPL, P1
[26]   Clustering by fast search and find of density peaks via heat diffusion [J].
Mehmood, Rashid ;
Zhang, Guangzhi ;
Bie, Rongfang ;
Dawood, Hassan ;
Ahmad, Haseeb .
NEUROCOMPUTING, 2016, 208 :210-217
[27]   An overview of clustering methods [J].
Omran, Mahamed G. H. ;
Engelbrecht, Andries P. ;
Salman, Ayed .
INTELLIGENT DATA ANALYSIS, 2007, 11 (06) :583-605
[28]   A simple and fast algorithm for K-medoids clustering [J].
Park, Hae-Sang ;
Jun, Chi-Hyuck .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) :3336-3341
[29]  
Rasmussen CE, 2000, ADV NEUR IN, V12, P554
[30]   Clustering by fast search and find of density peaks [J].
Rodriguez, Alex ;
Laio, Alessandro .
SCIENCE, 2014, 344 (6191) :1492-1496