Enhanced synchronization-inspired clustering for high-dimensional data

被引：16

作者：

Chen, Lei ^{[1
]}

Guo, Qinghua ^{[1
]}

Liu, Zhaohua ^{[1
]}

Zhang, Shiwen ^{[1
]}

Zhang, Hongqiang ^{[1
]}

机构：

[1] Hunan Univ Sci & Technol, Sch Informat & Elect Engn, Xiangtan, Peoples R China

来源：

COMPLEX & INTELLIGENT SYSTEMS | 2021年 / 7卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Synchronization-inspired; Clustering; High-dimensional dataset; Local density; METRICS; PCA;

D O I：

10.1007/s40747-020-00191-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The synchronization-inspired clustering algorithm (Sync) is a novel and outstanding clustering algorithm, which can accurately cluster datasets with any shape, density and distribution. However, the high-dimensional dataset with high dimensionality, high noise, and high redundancy brings some new challenges for the synchronization-inspired clustering algorithm, resulting in a significant increase in clustering time and a decrease in clustering accuracy. To address these challenges, an enhanced synchronization-inspired clustering algorithm, namely SyncHigh, is developed in this paper to quickly and accurately cluster the high-dimensional datasets. First, a PCA-based (Principal Component Analysis) dimension purification strategy is designed to find the principal components in all attributes. Second, a density-based data merge strategy is constructed to reduce the number of objects participating in the synchronization-inspired clustering algorithm, thereby speeding up clustering time. Third, the Kuramoto Model is enhanced to deal with mass differences between objects caused by the density-based data merge strategy. Finally, extensive experimental results on synthetic and real-world datasets show the effectiveness and efficiency of our SyncHigh algorithm.

引用

页码：203 / 223

页数：21

共 25 条

[1] Automatic subspace clustering of high dimensional data [J].

Agrawal, R ;

Gehrke, J ;

Gunopulos, D ;

Raghavan, P .

DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (01) :5-33

[2] A comparison of extrinsic clustering evaluation metrics based on formal constraints [J].

Amigo, Enrique ;

Gonzalo, Julio ;

Artiles, Javier ;

Verdejo, Felisa .

INFORMATION RETRIEVAL, 2009, 12 (04) :461-486

[3] Interactive Clustering: A Comprehensive Review [J].

Bae, Juhee ;

Helldin, Tove ;

Riveiro, Maria ;

Nowaczyk, Slawomir ;

Bouguelia, Mohamed-Rafik ;

Falkman, Goran .

ACM COMPUTING SURVEYS, 2020, 53 (01)

[4]

Bohm C., 2010, P 16 ACM SIGKDD INT, p583 , DOI DOI 10.1145/1835804.1835879

[5] An efficient approximation to the K-means clustering for massive data [J].

Capo, Marco ;

Perez, Aritz ;

Lozano, Jose A. .

KNOWLEDGE-BASED SYSTEMS, 2017, 117 :56-69

[6] Fast Community Detection Based on Distance Dynamics [J].

Chen, Lei ;

Zhang, Jing ;

Cai, Lijun ;

Deng, Ziyun .

TSINGHUA SCIENCE AND TECHNOLOGY, 2017, 22 (06) :564-585

[7] A feature group weighting method for subspace clustering of high-dimensional data [J].

Chen, Xiaojun ;

Ye, Yunming ;

Xu, Xiaofei ;

Huang, Joshua Zhexue .

PATTERN RECOGNITION, 2012, 45 (01) :434-446

[8] Fuzzy minimum spanning tree with interval type 2 fuzzy arc length: formulation and a new genetic algorithm [J].

Dey, Arindam ;

Son, Le Hoang ;

Pal, Anita ;

Long, Hoang Viet .

SOFT COMPUTING, 2020, 24 (06) :3963-3974

[9] Locally adaptive metrics for clustering high dimensional data [J].

Domeniconi, Carlotta ;

Gunopulos, Dimitrios ;

Ma, Sheng ;

Yan, Bojun ;

Al-Razgan, Muna ;

Papadopoulos, Dimitris .

DATA MINING AND KNOWLEDGE DISCOVERY, 2007, 14 (01) :63-97

[10] An Expectation-Maximization algorithm for the Wishart mixture model: Application to movement clustering [J].

Hidot, Sullivan ;

Saint-Jean, Christophe .

PATTERN RECOGNITION LETTERS, 2010, 31 (14) :2318-2324

← 1 2 3 →