Quasi-cluster centers clustering algorithm based on potential entropy and t-distributed stochastic neighbor embedding

被引:6
作者
Fang, Xian [1 ]
Tie, Zhixin [1 ]
Guan, Yinan [1 ]
Rao, Shanshan [1 ]
机构
[1] Zhejiang Sci Tech Univ, Sch Informat Sci & Technol, Hangzhou, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Data clustering; Quasi-cluster centers clustering; Potential entropy; Optimal parameter; t-distributed stochastic neighbor embedding; DENSITY PEAKS; FAST SEARCH; FIND; REDUCTION; ROCK;
D O I
10.1007/s00500-018-3221-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A novel density-based clustering algorithm named QCC is presented recently. Although the algorithm has proved its strong robustness, it is still necessary to manually determine the two input parameters, including the number of neighbors (k) and the similarity threshold value (), which severely limits the promotion of the algorithm. In addition, the QCC does not perform excellently when confronting the datasets with relatively high dimensions. To overcome these defects, firstly, we define a new method for computing local density and introduce the strategy of potential entropy into the original algorithm. Based on this idea, we propose a new QCC clustering algorithm (QCC-PE). QCC-PE can automatically extract optimal value of the parameter k by optimizing potential entropy of data field. By this means, the optimized parameter can be calculated from the datasets objectively rather than the empirical estimation accumulated from a large number of experiments. Then, t-distributed stochastic neighbor embedding (tSNE) is applied to the model of QCC-PE and further brings forward a method based on tSNE (QCC-PE-tSNE), which preprocesses high-dimensional datasets by dimensionality reduction technique. We compare the performance of the proposed algorithms with QCC, DBSCAN, and DP in the synthetic datasets, Olivetti Face Database, and real-world datasets respectively. Experimental results show that our algorithms are feasible and effective and can often outperform the comparisons.
引用
收藏
页码:5645 / 5657
页数:13
相关论文
共 42 条
[1]  
Agrawal R., 1998, SIGMOD Record, V27, P94, DOI 10.1145/276305.276314
[2]  
Ankerst M., 1999, SIGMOD Record, V28, P49, DOI 10.1145/304181.304187
[3]   Stimulus Dependence of Local Field Potential Spectra: Experiment versus Theory [J].
Barbieri, Francesca ;
Mazzoni, Alberto ;
Logothetis, Nikos K. ;
Panzeri, Stefano ;
Brunel, Nicolas .
JOURNAL OF NEUROSCIENCE, 2014, 34 (44) :14589-14605
[4]   ART-3 - HIERARCHICAL SEARCH USING CHEMICAL TRANSMITTERS IN SELF-ORGANIZING PATTERN-RECOGNITION ARCHITECTURES [J].
CARPENTER, GA ;
GROSSBERG, S .
NEURAL NETWORKS, 1990, 3 (02) :129-152
[5]   A MASSIVELY PARALLEL ARCHITECTURE FOR A SELF-ORGANIZING NEURAL PATTERN-RECOGNITION MACHINE [J].
CARPENTER, GA ;
GROSSBERG, S .
COMPUTER VISION GRAPHICS AND IMAGE PROCESSING, 1987, 37 (01) :54-115
[6]   Enhancing density-based clustering: Parameter reduction and outlier detection [J].
Cassisi, Carmelo ;
Ferro, Alfredo ;
Giugno, Rosalba ;
Pigola, Giuseppe ;
Pulvirenti, Alfredo .
INFORMATION SYSTEMS, 2013, 38 (03) :317-330
[7]   Robust path-based spectral clustering [J].
Chang, Hong ;
Yeung, Dit-Yan .
PATTERN RECOGNITION, 2008, 41 (01) :191-203
[8]   Mean shift: A robust approach toward feature space analysis [J].
Comaniciu, D ;
Meer, P .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (05) :603-619
[9]   A semi-supervised approximate spectral clustering algorithm based on HMRF model [J].
Ding, Shifei ;
Jia, Hongjie ;
Du, Mingjing ;
Xue, Yu .
INFORMATION SCIENCES, 2018, 429 :215-228
[10]   An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood [J].
Ding, Shifei ;
Du, Mingjing ;
Sun, Tongfeng ;
Xu, Xiao ;
Xue, Yu .
KNOWLEDGE-BASED SYSTEMS, 2017, 133 :294-313