Probabilistic Support Prediction: Fast Frequent Itemset Mining in Dense Data

被引:2
|
作者
Sadeequllah, Muhammad [1 ]
Rauf, Azhar [1 ]
Rehman, Saif Ur [1 ]
Alnazzawi, Noha [2 ]
机构
[1] Univ Peshawar, Dept Comp Sci, Peshawar 25120, Pakistan
[2] Yanbu Ind Coll, Comp Sci & Engn Dept, Yanbu 46452, Saudi Arabia
关键词
Association rule; data mining; frequent itemsets mining; support-count approximation; approximate algorithms; transaction databases; EFFICIENT ALGORITHM; PATTERNS;
D O I
10.1109/ACCESS.2024.3376477
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent itemset mining (FIM) is a highly resource-demanding data-mining task fundamental to numerous data-mining applications. Support calculation is a frequently performed computation-intensive operation of FIM algorithms, whereas storing transactional data is memory-intensive. The FIM is even more resource-hungry for dense data than for sparse data. The rapidly growing size of datasets further exacerbates this situation and necessitates the design of out-of-the-box highly efficient solutions. This paper proposes a novel approach to frequent itemset mining for dense datasets. This approach, after the initial stage, does not use transactional data, which makes it memory efficient. It also replaces processing-intensive support calculations with efficient support predictions, which are probabilistic and need no transactional data. To predict the support of an itemset, it only needs the support of its subsets. However, this technique works only for itemsets of size three or higher. We also propose an FIM algorithm ProbBF, which incorporates this technique. The ProbBF discards transactional data after it uses it to calculate frequent one and two-size itemsets. For the itemsets of size k, where k >= 3 , ProbBF uses the proposed probabilistic technique to predict their support. It is considered frequent if the predicted support is greater than a given threshold. Our experiments show that ProbBF is efficient in both time and space against state-of-the-art FIM algorithms that use transactional data. The experiments also show that ProbBF can successfully generate the majority of the frequent itemsets on real-world datasets. Since ProbBF is probabilistic, some loss in quality is inevitable.
引用
收藏
页码:39330 / 39350
页数:21
相关论文
共 50 条
  • [31] Probabilistic maximal frequent itemset mining methods over uncertain databases
    Li, Haifeng
    Hai, Mo
    Zhang, Ning
    Zhu, Jianming
    Wang, Yue
    Cao, Huaihu
    INTELLIGENT DATA ANALYSIS, 2019, 23 (06) : 1219 - 1241
  • [32] A novel algorithm for frequent itemset mining in data warehouses
    徐利军
    谢康林
    Journal of Zhejiang University Science A(Science in Engineering), 2006, (02) : 216 - 224
  • [33] Recommendation using Frequent Itemset Mining in Big Data
    Kunjachan, Honeytta
    Hareesh, M. J.
    Sreedevi, K. M.
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2018, : 561 - 566
  • [34] Parallel Incremental Frequent Itemset Mining for Large Data
    Yu-Geng Song
    Hui-Min Cui
    Xiao-Bing Feng
    Journal of Computer Science and Technology, 2017, 32 : 368 - 385
  • [35] Improvement of Eclat Algorithm Based on Support in Frequent Itemset Mining
    Yu, Xiaomei
    Wang, Hong
    JOURNAL OF COMPUTERS, 2014, 9 (09) : 2116 - 2123
  • [36] Inverted Index Automata Frequent Itemset Mining for Large Dataset Frequent Itemset Mining
    Dai, Xin
    Hamed, Haza Nuzly Abdull
    Su, Qichen
    Hao, Xue
    IEEE ACCESS, 2024, 12 : 195111 - 195130
  • [37] Frequent Itemset Mining on Hadoop
    Ferenc Kovacs
    Illes, Janos
    IEEE 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL CYBERNETICS (ICCC 2013), 2013, : 241 - 245
  • [38] On A Visual Frequent Itemset Mining
    Lim, SeungJin
    2009 FOURTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, 2009, : 25 - 30
  • [39] A Support-Ordered Trie for fast frequent itemset discovery
    Woon, YK
    Ng, WK
    Lim, EP
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (07) : 875 - 879
  • [40] Fast algorithms for frequent itemset mining using FP-trees
    Grahne, G
    Zhu, JF
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (10) : 1347 - 1362