ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

被引：6

作者：

Fatehi, Kavan ^{[1
]}

Rezvani, Mohsen ^{[2
]}

Fateh, Mansoor ^{[2
]}

机构：

[1] Yazd Univ, Yazd, Iran

[2] Shahrood Univ Technol, Shahrood, Iran

来源：

PATTERN ANALYSIS AND APPLICATIONS | 2020年 / 23卷 / 04期

关键词：

High-dimensional data; Subspace clustering; Cluster similarity; DENSITY;

D O I：

10.1007/s10044-020-00884-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The curse of dimensionality in high-dimensional data is one of the major challenges in data clustering. Recently, a considerable amount of literature has been published on subspace clustering to address this challenge. The main objective of the subspace clustering is to discover clusters embedded in any possible combination of the attributes. Previous studies have mostly been generating redundant subspace clusters, leading to clustering accuracy loss and also increasing the running time. In this paper, a bottom-up density-based approach is proposed for clustering of high-dimensional data. We employ the cluster structure as a similarity measure to generate the optimal subspaces which result in raising the accuracy of the subspace clustering. Using this idea, we propose an iterative algorithm to discover similar subspaces using the similarity in the features of subspaces. At each iteration of this algorithm, it first determines similar subspaces, then combines them to generate higher-dimensional subspaces, and finally re-clusters the subspaces. The algorithm repeats these steps and converges to the final clusters. Experiments on various synthetic and real datasets show that the results of the proposed approach are significantly better in both quality and runtime comparing to the state of the art on clustering high-dimensional data. The accuracy of the proposed method is around 34% higher than the CLIQUE algorithm and around 6% higher than DiSH.

引用

页码：1651 / 1663

页数：13

共 34 条

[1] Achtert E, 2007, LECT NOTES COMPUT SC, V4443, P152
[2] Aggarwal CC, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P61, DOI 10.1145/304181.304188
[3] Agrawal R., 1999, US Patent, Patent No. [6,003,029, 6003029]
[4] Efficient Cluster Detection by Ordered Neighborhoods
Aksehirli, Emin
Goethals, Bart
Mueller, Emmanuel
[J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, 2015, 9263 : 15 - 27
[5] [Anonymous], 2002, P 2002 ACM SIGMOD IN, DOI [10.1145/564691.564739, DOI 10.1145/564691.564739]
[6] DUSC: Dimensionality unbiased subspace clustering
Assent, Ira
Krieger, Ralph
Mueller, Emmanuel
Seidl, Thomas
[J]. ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 409 - 414
[7] Density connected clustering with local subspace preferences
Böhm, C
Kailing, K
Kriegel, HP
Kröger, P
[J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 27 - 34
[8] Model-based clustering of high-dimensional data: A review
Bouveyron, Charles
Brunet-Saumard, Camille
[J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 52 - 78
[9] A feature group weighting method for subspace clustering of high-dimensional data
Chen, Xiaojun
Ye, Yunming
Xu, Xiaofei
Huang, Joshua Zhexue
[J]. PATTERN RECOGNITION, 2012, 45 (01) : 434 - 446
[10] Density Conscious Subspace Clustering for High-Dimensional Data
Chu, Yi-Hong
Huang, Jen-Wei
Chuang, Kun-Ta
Yang, De-Nian
Chen, Ming-Syan
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (01) : 16 - 30

← 1 2 3 4 →