Efficient agglomerative hierarchical clustering

被引：318

作者：

Bouguettaya, Athman ^{[1
]}

Yu, Qi ^{[2
]}

Liu, Xumin ^{[2
]}

Zhou, Xiangmin ^{[3
]}

Song, Andy ^{[1
]}

机构：

[1] RMIT Univ, Melbourne, Vic, Australia

[2] Rochester Inst Technol, Rochester, NY USA

[3] Univ Victoria, Victoria, BC V8W 2Y2, Canada

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2015年 / 42卷 / 05期

关键词：

Clustering analysis; Hybrid clustering; Data mining; Data distribution; Coefficient of correlation; ALGORITHMS;

D O I：

10.1016/j.eswa.2014.09.054

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Hierarchical clustering is of great importance in data analytics especially because of the exponential growth of real-world data. Often these data are unlabelled and there is little prior domain knowledge available. One challenge in handling these huge data collections is the computational cost. In this paper, we aim to improve the efficiency by introducing a set of methods of agglomerative hierarchical clustering. Instead of building cluster hierarchies based on raw data points, our approach builds a hierarchy based on a group of centroids. These centroids represent a group of adjacent points in the data space. By this approach, feature extraction or dimensionality reduction is not required. To evaluate our approach, we have conducted a comprehensive experimental study. We tested the approach with different clustering methods (i.e., UPGMA and SLINK), data distributions, (i.e., normal and uniform), and distance measures (i.e., Euclidean and Canberra). The experimental results indicate that, using the centroid based approach, computational cost can be significantly reduced without compromising the clustering performance. The performance of this approach is relatively consistent regardless the variation of the settings, i.e., clustering methods, data distributions, and distance measures. (C) 2014 Elsevier Ltd. All rights reserved.

引用

页码：2785 / 2797

页数：13

共 30 条

[1] Incremental cluster-based retrieval using compressed cluster-skipping inverted files [J].

Altingovde, Ismail Sengor ;

Demir, Engin ;

Can, Fazli ;

Ulusoy, Oezguer .

ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2008, 26 (03)

[2]

[Anonymous], ICDE

[3]

[Anonymous], INT J COMPUTER APPL

[4]

[Anonymous], IEEE T KNOWLEDGE DAT

[5]

[Anonymous], 2004, SIGKDD EXPLOR, DOI DOI 10.1145/1007730.1007731

[6]

[Anonymous], INT J ARTIFICIAL INT

[7]

[Anonymous], ADV NEUR INF PROC SY

[8]

[Anonymous], ENCY COMPUTER SCI TE

[9] On-line clustering [J].

Bouguettaya, A .

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (02) :333-339

[10]

Cheng D., 2005, ACM SIGMOD SIGACT SI, P196

← 1 2 3 →