GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data

被引:14
作者
Mansoori, Eghbal G. [1 ]
机构
[1] Shiraz Univ, Sch Elect & Comp Engn, Shiraz, Iran
关键词
Grid-based clustering; Hierarchical clustering; Feature selection; High-dimensional data;
D O I
10.1007/s00500-013-1105-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a grid-based hierarchical clustering algorithm (GACH) as an efficient and robust method to explore clusters in high-dimensional data with no prior knowledge. It discovers the initial positions of the potential clusters automatically and then combines them hierarchically to obtain the final clusters. In this regard, GACH first projects the data patterns on a two-dimensional space (i.e., on a plane established by two features) to overcome the curse of dimensionality problem in high-dimensional data. To choose these two well-informed features, a simple and fast feature selection algorithm is proposed. Then, through meshing the plane with grid lines, GACH detects the crowded grid points. The nearest data patterns around these grid points are considered as initial members of some potential clusters. By returning the patterns back to their true dimensions, GACH refines these clusters. In the merging phase, GACH combines the closely adjacent clusters in a hierarchical bottom-up manner to construct the final clusters' members. The main features of GACH are: (1) it automatically discovers the clusters, (2) the obtained clusters are stable, (3) it is efficient for data sets with high dimensions, and (4) its merging process involves a threshold which can be obtained in advance for well-clustered data. To assess our proposed algorithm, it is applied on some benchmark data sets and the validity of obtained clusters is compared with the results of some other clustering algorithms. This comparison shows that GACH is accurate, efficient and feasible to discover clusters in high-dimensional data.
引用
收藏
页码:905 / 922
页数:18
相关论文
共 50 条
[21]   LILA: A Connected Components Labeling Algorithm in Grid-Based Clustering [J].
Jiang, Tao ;
Qiu, Ming ;
Chen, Jie ;
Cao, Xue .
FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, :213-216
[22]   Grid-based clustering algorithm based on intersecting partition and density estimation [J].
Qiu, Bao-Zhi ;
Li, Xiang-Li ;
Shen, Jun-Yi .
EMERGING TECHNOLOGIES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2007, 4819 :368-+
[23]   A Robust and High-Dimensional Clustering Algorithm Based on Feature Weight and Entropy [J].
Du, Xinzhi .
ENTROPY, 2023, 25 (03)
[24]   ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data [J].
Fatehi, Kavan ;
Rezvani, Mohsen ;
Fateh, Mansoor .
PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (04) :1651-1663
[25]   ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data [J].
Kavan Fatehi ;
Mohsen Rezvani ;
Mansoor Fateh .
Pattern Analysis and Applications, 2020, 23 :1651-1663
[26]   Clustering of imbalanced high-dimensional media data [J].
Brodinova, Sarka ;
Zaharieva, Maia ;
Filzmoser, Peter ;
Ortner, Thomas ;
Breiteneder, Christian .
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2018, 12 (02) :261-284
[27]   Clustering of imbalanced high-dimensional media data [J].
Šárka Brodinová ;
Maia Zaharieva ;
Peter Filzmoser ;
Thomas Ortner ;
Christian Breiteneder .
Advances in Data Analysis and Classification, 2018, 12 :261-284
[28]   Clustering High-Dimensional Data via Random Sampling and Consensus [J].
Traganitis, Panagiotis A. ;
Slavakis, Konstantinos ;
Giannakis, Georgios B. .
2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, :307-311
[29]   Using Feature Clustering for GP-Based Feature Construction on High-Dimensional Data [J].
Binh Tran ;
Xue, Bing ;
Zhang, Mengjie .
GENETIC PROGRAMMING, EUROGP 2017, 2017, 10196 :210-226
[30]   Clustering High-Dimensional Data: A Survey on Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering [J].
Kriegel, Hans-Peter ;
Kroeger, Peer ;
Zimek, Arthur .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (01)