CC-IFIM: an efficient approach for incremental frequent itemset mining based on closed candidates

被引：6

作者：

Magdy, Maged ^{[1
]}

Ghaleb, Fayed F. M. ^{[1
]}

Mohamed, Dawlat A. El A. ^{[1
]}

Zakaria, Wael ^{[1
]}

机构：

[1] Ain Shams Univ, Fac Sci, Dept Math, Cairo, Egypt

来源：

JOURNAL OF SUPERCOMPUTING | 2023年 / 79卷 / 07期

关键词：

Frequent itemsets; Incremental frequent itemsets; Similarity measurements; Closed frequent itemsets;

D O I：

10.1007/s11227-022-04976-5

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Frequent itemset mining (FIM) is the crucial task in mining association rules that finds all frequent k-itemsets in the transaction dataset from which all association rules are extracted. In the big-data era, the datasets are huge and rapidly expanding, so adding new transactions as time advances results in periodic changes in correlations and frequent itemsets present in the dataset. Re-mining the updated dataset is impractical and costly. This problem is solved via incremental frequent itemset mining. Numerous researchers view the new transactions as a distinct dataset (partition) that may be mined to obtain all of its frequent item sets. The extracted local frequent itemsets are then combined to create a collection of global candidates, where it is possible to estimate the support count of the combined candidates to avoid re-scanning the dataset. However, these works are hampered by the growth of a huge number of candidates, and the support count estimation is still imprecise. In this paper, the Closed Candidates-based Incremental Frequent Itemset Mining approach, or CC-IFIM, has been proposed to decrease candidate generation and improve the accuracy of the global frequent itemsets that are retrieved. The proposed approach is able to prune several produced candidates in earlier steps before performing any further computations. To improve the accuracy of the computation of the support count of the produced candidates, the similarity between partitions has been evaluated using just the local closed candidates rather than all candidates. The experimental findings demonstrated that the CC-IFIM approach is superior to its competitors in terms of efficiency and accuracy.

引用

页码：7877 / 7899

页数：23

共 20 条

[1]

Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072

[2]

Agrawal R., 1994, P 20 INT C VER LARG, P487, DOI DOI 10.5555/645920.672836

[3] A Fast Approach for Up-Scaling Frequent Itemsets [J].

Chen, Runzi ;

Zhao, Shuliang ;

Liu, Mengmeng .

IEEE ACCESS, 2020, 8 :97141-97151

[4]

Cheung D. W., 1997, Database Systems for Advanced Applications '97. Proceedings of the Fifth International Conference, P185, DOI 10.1142/9789812819536_0020

[5] Maintenance of discovered association rules in large databases: Art incremental updating technique [J].