Probabilistic Support Prediction: Fast Frequent Itemset Mining in Dense Data

被引：2

作者：

Sadeequllah, Muhammad ^{[1
]}

Rauf, Azhar ^{[1
]}

Rehman, Saif Ur ^{[1
]}

Alnazzawi, Noha ^{[2
]}

机构：

[1] Univ Peshawar, Dept Comp Sci, Peshawar 25120, Pakistan

[2] Yanbu Ind Coll, Comp Sci & Engn Dept, Yanbu 46452, Saudi Arabia

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Association rule; data mining; frequent itemsets mining; support-count approximation; approximate algorithms; transaction databases; EFFICIENT ALGORITHM; PATTERNS;

D O I：

10.1109/ACCESS.2024.3376477

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Frequent itemset mining (FIM) is a highly resource-demanding data-mining task fundamental to numerous data-mining applications. Support calculation is a frequently performed computation-intensive operation of FIM algorithms, whereas storing transactional data is memory-intensive. The FIM is even more resource-hungry for dense data than for sparse data. The rapidly growing size of datasets further exacerbates this situation and necessitates the design of out-of-the-box highly efficient solutions. This paper proposes a novel approach to frequent itemset mining for dense datasets. This approach, after the initial stage, does not use transactional data, which makes it memory efficient. It also replaces processing-intensive support calculations with efficient support predictions, which are probabilistic and need no transactional data. To predict the support of an itemset, it only needs the support of its subsets. However, this technique works only for itemsets of size three or higher. We also propose an FIM algorithm ProbBF, which incorporates this technique. The ProbBF discards transactional data after it uses it to calculate frequent one and two-size itemsets. For the itemsets of size k, where k >= 3 , ProbBF uses the proposed probabilistic technique to predict their support. It is considered frequent if the predicted support is greater than a given threshold. Our experiments show that ProbBF is efficient in both time and space against state-of-the-art FIM algorithms that use transactional data. The experiments also show that ProbBF can successfully generate the majority of the frequent itemsets on real-world datasets. Since ProbBF is probabilistic, some loss in quality is inevitable.

引用

页码：39330 / 39350

页数：21

共 50 条

[31] Probabilistic maximal frequent itemset mining methods over uncertain databases
Li, Haifeng
Hai, Mo
Zhang, Ning
Zhu, Jianming
Wang, Yue
Cao, Huaihu
INTELLIGENT DATA ANALYSIS, 2019, 23 (06) : 1219 - 1241
[32] A novel algorithm for frequent itemset mining in data warehouses
徐利军
谢康林
Journal of Zhejiang University Science A(Science in Engineering), 2006, (02) : 216 - 224
[33] Recommendation using Frequent Itemset Mining in Big Data
Kunjachan, Honeytta
Hareesh, M. J.
Sreedevi, K. M.
PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2018, : 561 - 566
[34] Parallel Incremental Frequent Itemset Mining for Large Data
Yu-Geng Song
Hui-Min Cui
Xiao-Bing Feng
Journal of Computer Science and Technology, 2017, 32 : 368 - 385
[35] Improvement of Eclat Algorithm Based on Support in Frequent Itemset Mining
Yu, Xiaomei
Wang, Hong
JOURNAL OF COMPUTERS, 2014, 9 (09) : 2116 - 2123
[36] Inverted Index Automata Frequent Itemset Mining for Large Dataset Frequent Itemset Mining
Dai, Xin
Hamed, Haza Nuzly Abdull
Su, Qichen
Hao, Xue
IEEE ACCESS, 2024, 12 : 195111 - 195130
[37] Frequent Itemset Mining on Hadoop
Ferenc Kovacs
Illes, Janos
IEEE 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL CYBERNETICS (ICCC 2013), 2013, : 241 - 245
[38] On A Visual Frequent Itemset Mining
Lim, SeungJin
2009 FOURTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, 2009, : 25 - 30
[39] A Support-Ordered Trie for fast frequent itemset discovery
Woon, YK
Ng, WK
Lim, EP
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (07) : 875 - 879
[40] Fast algorithms for frequent itemset mining using FP-trees
Grahne, G
Zhu, JF
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (10) : 1347 - 1362

← 1 2 3 4 5 →