A general grid-clustering approach

被引:33
作者
Yue, Shihong [1 ]
Wei, Miaomiao [1 ]
Wang, Jeen-Shing [2 ]
Wang, Huaxiang [1 ]
机构
[1] Tianjin Univ, Sch Elect Engn & Automat, Tianjin 300072, Peoples R China
[2] Natl Cheng Kung Univ, Dept Elect Engn, Tainan 701, Taiwan
基金
中国国家自然科学基金;
关键词
clustering; core grid; grid size; locality;
D O I
10.1016/j.patrec.2008.02.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical clustering is an important part of cluster analysis. Based on various theories, numerous hierarchical clustering algorithms have been developed, and new clustering algorithms continue to appear in the literature. It is known that both divisive and agglomerative clustering algorithms in hierarchical clustering play a pivotal role in data-based models, and have been successfully applied in clustering very large datasets. However, hierarchical clustering is parameter-sensitive. When the user has no knowledge of how to choose the input parameters, the clustering results may become undesirable. In this paper, we propose a general grid-clustering approach (GGCA) under a common assumption about hierarchical clustering. The key features of the GGCA include: (1) the combination of the divisible and the agglomerative clustering algorithms into a unifying generative framework; (2) the determination of key input parameters: an optimal grid size for the first time; and (3) the application of a two-phase merging process to aggregate all data objects. Consequently, the GGCA is a non-parametric algorithm which does not require users to input parameters, and exhibits excellent performance in dealing with not well-separated and shape-diverse clusters. Some experimental results comparing the proposed GGCA with the existing methods show the superiority of the GGCA approach. (c) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:1372 / 1384
页数:13
相关论文
共 24 条
[1]   Automatic subspace clustering of high dimensional data [J].
Agrawal, R ;
Gehrke, J ;
Gunopulos, D ;
Raghavan, P .
DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (01) :5-33
[2]  
AGRAWAL R, 1998, P ACM SIGMOD INT C M, P73
[3]  
[Anonymous], 2010, UCI Machine Learning Repository
[4]  
Beyer Kevin., 1999, INT C DATABASE THEOR, P217, DOI [DOI 10.1007/3-540-49257-7_15, 10.1007/3-540-49257-7_15]
[5]  
BEZDEK JC, 1999, IEEE T FUZZY SYST, V7, P23
[6]  
CHIANG H, 2004, IEEE T FUZZY SYST, V15, P45
[7]   Context-sensitive learning methods for text categorization [J].
Cohen, WW ;
Singer, Y .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1999, 17 (02) :141-173
[8]  
EDEN W, 2004, PATTERN RECOGN, V37, P503
[9]   A new cluster isolation criterion based on dissimilarity increments [J].
Fred, ALN ;
Leitao, JMN .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (08) :944-958
[10]  
GRABUSTS P, 2002, INT C PAR COMP EL EN