H-mine: Hyper-structure mining of frequent patterns in large databases

被引:180
作者
Pei, J [1 ]
Han, JW [1 ]
Lu, HJ [1 ]
Nishio, S [1 ]
Tang, SW [1 ]
Yang, DQ [1 ]
机构
[1] Peking Univ, Beijing 100871, Peoples R China
来源
2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2001年
关键词
D O I
10.1109/ICDM.2001.989550
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Methods for efficient mining of frequent patterns have been studied extensively by many researchers. However, the previously proposed methods still encounter some performance bottlenecks when mining databases with different data characteristics, such as dense vs. sparse, long vs, short patterns, memory-based vs. disk-based, etc. In this study, we propose a simple and novel hyper-linked data structure, H-struct and a new mining algorithm, H-mine, which takes advantage of this data structure and dynamically adjusts links in the mining process. A distinct feature of this method is that it has very limited and precisely predictable space overhead and runs really fast in memory-based setting. Moreover, it can be scaled lip to very large databases by database partitioning, and when the data set becomes dense, (conditional) FP-trees can be constructed dynamically as part of the mining process. Our study shows that H-mine has high performance in various kinds of data, outperforms the previously developed algorithms in different settings, and is highly scalable in mining large databases. This study, also proposes a new data mining methodology, space-preserving mining, which may have strong impact in the future development of efficient and scalable data mining methods.
引用
收藏
页码:441 / 448
页数:8
相关论文
共 17 条
[1]  
Agarwal R., 2000, J PARALLEL DISTRIBUT
[2]  
AGARWAL R, SIGMOD 98, P94
[3]  
Agrawal R., VLDB 94, P487
[4]  
BAYARDO RJ, SIGMOD 98, P85
[5]  
BAYARDO RJ, ICDE 99
[6]  
Han J., SIGMOD 00, P1
[7]  
LIU B, KIDD 98, P80
[8]  
MANNILA H, KDD 94, P181
[9]  
NG R, SIGMOD 98, P13
[10]  
Pei J., ICDE 01, P433