A data mining proxy approach for efficient frequent itemset mining

被引:2
作者
Yu, Jeffrey Xu [1 ]
Li, Zhiheng [1 ]
Liu, Guimei [2 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
[2] Natl Univ Singapore, Singapore 117548, Singapore
关键词
Data Mining; Association Rule; Frequent Pattern; Minimum Support; Frequent Itemset;
D O I
10.1007/s00778-007-0047-0
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data mining has attracted a lot of research efforts during the past decade. However, little work has been reported on the efficiency of supporting a large number of users who issue different data mining queries periodically when there are new needs and when data is updated. Our work is motivated by the fact that the pattern-growth method is one of the most efficient methods for frequent pattern mining which constructs an initial tree and mines frequent patterns on top of the tree. In this paper, we present a data mining proxy approach that can reduce the I/O costs to construct an initial tree by utilizing the trees that have already been resident in memory. The tree we construct is the smallest for a given data mining query. In addition, our proxy approach can also reduce CPU cost in mining patterns, because the cost of mining relies on the sizes of trees. The focus of the work is to construct an initial tree efficiently. We propose three tree operations to construct a tree. With a unique coding scheme, we can efficiently project subtrees from on-disk trees or in-memory trees. Our performance study indicated that the data mining proxy significantly reduces the I/O cost to construct trees and CPU cost to mine patterns over the trees constructed.
引用
收藏
页码:947 / 970
页数:24
相关论文
共 43 条
  • [1] Agarwal R., 1994, P 20 INT C VER LARG, V487, P499
  • [2] AGARWAL RC, 2001, J PARALLEL DISTRIB C, V61
  • [3] AGARWAL RC, 2001, P 6 ACM SIGKDD INT C
  • [4] Agrawal R, 1993, P 1993 ACM SIGMOD C
  • [5] [Anonymous], P 6 INT C EXT DAT TE
  • [6] BAYARDO RJ, 1998, P 1998 ACM SIGMOD C
  • [7] Brin S., 1997, P 1997 ACM SIGMOD C
  • [8] BUCILA C, 2002, P 8 ACM SIGKDD C
  • [9] BURDICK D, 2001, P 2001 INT C DAT ENG
  • [10] CONG G, 2002, P 2002 INT C DAT MIN