Probabilistic Support Prediction: Fast Frequent Itemset Mining in Dense Data

被引:2
|
作者
Sadeequllah, Muhammad [1 ]
Rauf, Azhar [1 ]
Rehman, Saif Ur [1 ]
Alnazzawi, Noha [2 ]
机构
[1] Univ Peshawar, Dept Comp Sci, Peshawar 25120, Pakistan
[2] Yanbu Ind Coll, Comp Sci & Engn Dept, Yanbu 46452, Saudi Arabia
关键词
Association rule; data mining; frequent itemsets mining; support-count approximation; approximate algorithms; transaction databases; EFFICIENT ALGORITHM; PATTERNS;
D O I
10.1109/ACCESS.2024.3376477
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent itemset mining (FIM) is a highly resource-demanding data-mining task fundamental to numerous data-mining applications. Support calculation is a frequently performed computation-intensive operation of FIM algorithms, whereas storing transactional data is memory-intensive. The FIM is even more resource-hungry for dense data than for sparse data. The rapidly growing size of datasets further exacerbates this situation and necessitates the design of out-of-the-box highly efficient solutions. This paper proposes a novel approach to frequent itemset mining for dense datasets. This approach, after the initial stage, does not use transactional data, which makes it memory efficient. It also replaces processing-intensive support calculations with efficient support predictions, which are probabilistic and need no transactional data. To predict the support of an itemset, it only needs the support of its subsets. However, this technique works only for itemsets of size three or higher. We also propose an FIM algorithm ProbBF, which incorporates this technique. The ProbBF discards transactional data after it uses it to calculate frequent one and two-size itemsets. For the itemsets of size k, where k >= 3 , ProbBF uses the proposed probabilistic technique to predict their support. It is considered frequent if the predicted support is greater than a given threshold. Our experiments show that ProbBF is efficient in both time and space against state-of-the-art FIM algorithms that use transactional data. The experiments also show that ProbBF can successfully generate the majority of the frequent itemsets on real-world datasets. Since ProbBF is probabilistic, some loss in quality is inevitable.
引用
收藏
页码:39330 / 39350
页数:21
相关论文
共 50 条
  • [1] Efficient Frequent Itemset Mining from Dense Data Streams
    Cuzzocrea, Alfredo
    Jiang, Fan
    Lee, Wookey
    Leung, Carson K.
    WEB TECHNOLOGIES AND APPLICATIONS, APWEB 2014, 2014, 8709 : 593 - 601
  • [2] Probabilistic frequent itemset mining over uncertain data streams
    Li, Haifeng
    Zhang, Ning
    Zhu, Jianming
    Wang, Yue
    Cao, Huaihu
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 112 : 274 - 287
  • [3] Probabilistic Frequent Itemset Mining in Uncertain Databases
    Bernecker, Thomas
    Kriegel, Hans-Peter
    Renz, Matthias
    Verhein, Florian
    Zuefle, Andreas
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 119 - 127
  • [4] Fast Algorithms for Frequent Itemset Mining from Uncertain Data
    Leung, Carson Kai-Sang
    MacKinnon, Richard Kyle
    Tanbeer, Syed K.
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 893 - 898
  • [5] Privacy-Preserving Frequent Itemset Mining for Sparse and Dense Data
    Laud, Peeter
    Pankova, Alisa
    SECURE IT SYSTEMS, NORDSEC 2017, 2017, 10674 : 139 - 155
  • [6] Efficient Probabilistic Frequent Itemset Mining in Big Sparse Uncertain Data
    Xu, Jing
    Li, Ning
    Mao, Xiao-Jiao
    Yang, Yu-Bin
    PRICAI 2014: TRENDS IN ARTIFICIAL INTELLIGENCE, 2014, 8862 : 235 - 247
  • [7] Frequent Itemset Mining on Correlated Probabilistic Databases
    Kalaz, Yasemin Asan
    Raman, Rajeev
    DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2018), PT II, 2018, 11030 : 84 - 98
  • [8] Probabilistic Frequent Itemset Mining on a GPU Cluster
    Kozawa, Yusuke
    Amagasa, Toshiyuki
    Kitagawa, Hiroyuki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (04): : 779 - 789
  • [9] Approximation of Probabilistic Maximal Frequent Itemset Mining Over Uncertain Sensed Data
    Chen, Sheng
    Nie, Lihai
    Tao, Xiaoyi
    Li, Zhiyang
    Zhao, Laiping
    IEEE ACCESS, 2020, 8 : 97529 - 97539
  • [10] Frequent Itemset Mining for Big Data
    Moens, Sandy
    Aksehirli, Emin
    Goethals, Bart
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,