Enhanced synchronization-inspired clustering for high-dimensional data

被引:16
作者
Chen, Lei [1 ]
Guo, Qinghua [1 ]
Liu, Zhaohua [1 ]
Zhang, Shiwen [1 ]
Zhang, Hongqiang [1 ]
机构
[1] Hunan Univ Sci & Technol, Sch Informat & Elect Engn, Xiangtan, Peoples R China
基金
中国国家自然科学基金;
关键词
Synchronization-inspired; Clustering; High-dimensional dataset; Local density; METRICS; PCA;
D O I
10.1007/s40747-020-00191-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The synchronization-inspired clustering algorithm (Sync) is a novel and outstanding clustering algorithm, which can accurately cluster datasets with any shape, density and distribution. However, the high-dimensional dataset with high dimensionality, high noise, and high redundancy brings some new challenges for the synchronization-inspired clustering algorithm, resulting in a significant increase in clustering time and a decrease in clustering accuracy. To address these challenges, an enhanced synchronization-inspired clustering algorithm, namely SyncHigh, is developed in this paper to quickly and accurately cluster the high-dimensional datasets. First, a PCA-based (Principal Component Analysis) dimension purification strategy is designed to find the principal components in all attributes. Second, a density-based data merge strategy is constructed to reduce the number of objects participating in the synchronization-inspired clustering algorithm, thereby speeding up clustering time. Third, the Kuramoto Model is enhanced to deal with mass differences between objects caused by the density-based data merge strategy. Finally, extensive experimental results on synthetic and real-world datasets show the effectiveness and efficiency of our SyncHigh algorithm.
引用
收藏
页码:203 / 223
页数:21
相关论文
共 25 条
[1]   Automatic subspace clustering of high dimensional data [J].
Agrawal, R ;
Gehrke, J ;
Gunopulos, D ;
Raghavan, P .
DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (01) :5-33
[2]   A comparison of extrinsic clustering evaluation metrics based on formal constraints [J].
Amigo, Enrique ;
Gonzalo, Julio ;
Artiles, Javier ;
Verdejo, Felisa .
INFORMATION RETRIEVAL, 2009, 12 (04) :461-486
[3]   Interactive Clustering: A Comprehensive Review [J].
Bae, Juhee ;
Helldin, Tove ;
Riveiro, Maria ;
Nowaczyk, Slawomir ;
Bouguelia, Mohamed-Rafik ;
Falkman, Goran .
ACM COMPUTING SURVEYS, 2020, 53 (01)
[4]  
Bohm C., 2010, P 16 ACM SIGKDD INT, p583 , DOI DOI 10.1145/1835804.1835879
[5]   An efficient approximation to the K-means clustering for massive data [J].
Capo, Marco ;
Perez, Aritz ;
Lozano, Jose A. .
KNOWLEDGE-BASED SYSTEMS, 2017, 117 :56-69
[6]   Fast Community Detection Based on Distance Dynamics [J].
Chen, Lei ;
Zhang, Jing ;
Cai, Lijun ;
Deng, Ziyun .
TSINGHUA SCIENCE AND TECHNOLOGY, 2017, 22 (06) :564-585
[7]   A feature group weighting method for subspace clustering of high-dimensional data [J].
Chen, Xiaojun ;
Ye, Yunming ;
Xu, Xiaofei ;
Huang, Joshua Zhexue .
PATTERN RECOGNITION, 2012, 45 (01) :434-446
[8]   Fuzzy minimum spanning tree with interval type 2 fuzzy arc length: formulation and a new genetic algorithm [J].
Dey, Arindam ;
Son, Le Hoang ;
Pal, Anita ;
Long, Hoang Viet .
SOFT COMPUTING, 2020, 24 (06) :3963-3974
[9]   Locally adaptive metrics for clustering high dimensional data [J].
Domeniconi, Carlotta ;
Gunopulos, Dimitrios ;
Ma, Sheng ;
Yan, Bojun ;
Al-Razgan, Muna ;
Papadopoulos, Dimitris .
DATA MINING AND KNOWLEDGE DISCOVERY, 2007, 14 (01) :63-97
[10]   An Expectation-Maximization algorithm for the Wishart mixture model: Application to movement clustering [J].
Hidot, Sullivan ;
Saint-Jean, Christophe .
PATTERN RECOGNITION LETTERS, 2010, 31 (14) :2318-2324