Discovering pattern-based subspace clusters by pattern tree

被引:10
作者
Guan, Jihong [1 ]
Gan, Yanglan [1 ]
Wang, Hao [2 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Hefei Univ Technol, Dept Comp Sci & Technol, Hefei 23009, Peoples R China
基金
中国国家自然科学基金;
关键词
Clustering analysis; Subspace clustering; Pattern similarity; Pattern tree;
D O I
10.1016/j.knosys.2009.02.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional clustering models based on distance similarity are not always effective in capturing correlation among data objects, while pattern-based clustering can do well in identifying correlation hidden among data objects. However, the state-of-the-art pattern-based clustering methods are inefficient and provide no metric to measure the clustering quality. This paper presents a new pattern-based subspace clustering method, which can tackle the problems mentioned above. Observing the analogy between mining frequent itemsets and discovering subspace clusters, we apply pattern tree - a structure used in frequent itemsets mining to determining the target subspaces by scanning the database once, which can be done efficiently in large datasets. Furthermore, we introduce a general clustering quality evaluation model to guide the identifying of meaningful clusters. The proposed new method enables the users to set flexibly proper quality-control parameters to meet different needs. Experimental results on synthetic and real datasets show that our method outperforms the existing methods in both efficiency and effectiveness. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:569 / 579
页数:11
相关论文
共 30 条
[1]  
Aggarwal C., 1999, P ACM SIGMOD INT C M
[2]  
Aggarwal C.C., 2004, Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, VLDB '04
[3]  
Aggarwal CC, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P61, DOI 10.1145/304181.304188
[4]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[5]  
Agrawal R., 1998, Proc. of ACM SIGMOD, P94
[6]  
[Anonymous], P 12 ACM SIGKDD INT
[7]  
Beyer K., 1999, P 7 INT C DAT THEOR, P217, DOI DOI 10.1007/3-540-49257-7_15
[8]  
Cheng Y., 2000, Proceedings International Conference on Intelligent System,s for Molecular Biology
[9]  
ISMB. International Conference on Intelligent System, V8, P93
[10]  
Goil S., 1999, MAFIA EFFICIENT SCAL