Batch incremental processing for FP-tree construction using FP-Growth algorithm

被引：0

作者：

Shashikumar G. Totad

R. B. Geeta

P. V. G. D. Prasad Reddy

机构：

[1] GMR Institute of Technology,

[2] Andhra University,undefined

来源：

Knowledge and Information Systems | 2012年 / 33卷

关键词：

Incremental mining; Batch incremental processing; Sequential incremental processing; Frequent patterns; Data mining; FP-tree; FP-Growth;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as mining of association rules, correlations. FP-tree is a very versatile data structure used for mining of frequent patterns in knowledge discovery and data mining process. FP-tree is a compact representation of transaction database that contains frequency information of all relevant frequent patterns (FP) of the database. All of the existing incremental frequent pattern mining algorithms, such as AFPIM, CATS, CanTree, CP-tree, and SPO-tree, perform incremental mining by processing one transaction of the incremental part of database at a time and updating it to the FP-tree of initial (original) database. Here, in this paper, we propose a novel method that takes advantage of FP-tree representation of incremental transaction database for incremental mining. We propose a batch incremental processing algorithm BIT_FPGrowth that restructures and merges two small consecutive duration FP-trees to obtain a FP-tree of the FP-Growth algorithm. Our BIT_FPGrowth uses FP-tree as preprocessed data repository to get transactions (i.e., item-sets), unlike other sequential incremental algorithms that read transactions from database. BIT_FPGrowth algorithm takes less time for constructing FP-tree. Our experimental results show that, as the size of the database increases, increase in runtime of BIT_FPGrowth is much less and is least of all the other algorithms.

引用

页码：475 / 490

页数：15

共 30 条

[1]

Aouad LM(2010)Performance study of distributed Apriori-like frequent itemsets mining Knowl Inf Syst 23 55-72

[2]

Le-Khac NA(2008)A survey on algorithms for mining frequent itemsets over data streams Knowl Inf Syst 16 1-2

[3]

Kechadi TM(2008)Efficient mining of maximal frequent itemsets from databases on a cluster of workstations Knowl Inf Syst 16 359-391

[4]

Cheng J(2012)Scaling up data mining algorithms: review and taxonomy Prog Artif Intell 1 71-87

[5]

Ke Y(2004)Mining frequent patterns without candidate generation: A frequent-pattern tree approach Data Min Knowl Discov 8 53-87

[6]

Ng W(2006)CanTree: a canonical-order tree for incremental frequent-pattern mining Knowl Inf Syst 11 287-311

[7]

Chung SM(2010)Using the structure of prelarge trees to incrementally mine frequent itemset New Gener Comput 28 5-20

[8]

Luo C(2005)Slid-ing window filtering: an efficient method for incremental mining on a time-variant database ELSEVIER-Inf Syst 30 227-244

[9]

García-Pedrajas N(2007)CanTree: a canonical-order tree for incremental frequent-pattern mining Knowl Inf Syst 11 287-311

[10]

Haro-Garcí A(2010)Batch processing for incremental FP-tree construction Int J Comput Appl IJCA 5 28-32

← 1 2 3 →