Probabilistic Support Prediction: Fast Frequent Itemset Mining in Dense Data

被引:2
|
作者
Sadeequllah, Muhammad [1 ]
Rauf, Azhar [1 ]
Rehman, Saif Ur [1 ]
Alnazzawi, Noha [2 ]
机构
[1] Univ Peshawar, Dept Comp Sci, Peshawar 25120, Pakistan
[2] Yanbu Ind Coll, Comp Sci & Engn Dept, Yanbu 46452, Saudi Arabia
关键词
Association rule; data mining; frequent itemsets mining; support-count approximation; approximate algorithms; transaction databases; EFFICIENT ALGORITHM; PATTERNS;
D O I
10.1109/ACCESS.2024.3376477
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent itemset mining (FIM) is a highly resource-demanding data-mining task fundamental to numerous data-mining applications. Support calculation is a frequently performed computation-intensive operation of FIM algorithms, whereas storing transactional data is memory-intensive. The FIM is even more resource-hungry for dense data than for sparse data. The rapidly growing size of datasets further exacerbates this situation and necessitates the design of out-of-the-box highly efficient solutions. This paper proposes a novel approach to frequent itemset mining for dense datasets. This approach, after the initial stage, does not use transactional data, which makes it memory efficient. It also replaces processing-intensive support calculations with efficient support predictions, which are probabilistic and need no transactional data. To predict the support of an itemset, it only needs the support of its subsets. However, this technique works only for itemsets of size three or higher. We also propose an FIM algorithm ProbBF, which incorporates this technique. The ProbBF discards transactional data after it uses it to calculate frequent one and two-size itemsets. For the itemsets of size k, where k >= 3 , ProbBF uses the proposed probabilistic technique to predict their support. It is considered frequent if the predicted support is greater than a given threshold. Our experiments show that ProbBF is efficient in both time and space against state-of-the-art FIM algorithms that use transactional data. The experiments also show that ProbBF can successfully generate the majority of the frequent itemsets on real-world datasets. Since ProbBF is probabilistic, some loss in quality is inevitable.
引用
收藏
页码:39330 / 39350
页数:21
相关论文
共 50 条
  • [41] Iterative sampling based frequent itemset mining for big data
    Wu, Xian
    Fan, Wei
    Peng, Jing
    Zhang, Kun
    Yu, Yong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2015, 6 (06) : 875 - 882
  • [42] AnyFI: An Anytime Frequent Itemset Mining Algorithm for Data Streams
    Goyal, Poonam
    Challa, Jagat Sesh
    Shrivastava, Shivin
    Goyal, Navneet
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 942 - 947
  • [43] Constrained Frequent Itemset Mining from Uncertain Data Streams
    Leung, Carson Kai-Sang
    Hao, Boyu
    Jiang, Fan
    2010 IEEE 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDE 2010), 2010, : 120 - 127
  • [44] MrFIM: A MapReduce Approach for Frequent Itemset Mining in Big Data
    Rahman, Abdul
    Manjaramkar, Arati
    2018 4TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [45] An algorithm for in-core frequent itemset mining on streaming data
    Jin, RM
    Agrawal, G
    FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, : 210 - 217
  • [46] A Review on Frequent Itemset Mining Algorithms in Social Network Data
    Dharsandiya, Ankit N.
    Patel, Mihir R.
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 1046 - 1048
  • [47] Frequent Weighted Itemset Mining from Gene Expression Data
    Baralis, Elena
    Cagliero, Luca
    Cerquitelli, Tania
    Chiusano, Silvia
    Garza, Paolo
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2013,
  • [48] Spatio-Temporal Frequent Itemset Mining on Web Data
    Aggarwal, Apeksha
    Toshniwal, Durga
    2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 1160 - 1165
  • [49] MapReduce Based Frequent Itemset Mining Algorithm on Stream Data
    Chaudhary, Hemant
    Yadav, Deepak Kumar
    Bhatnagar, Rajat
    Chandrasekhar, Uddagiri
    2015 GLOBAL CONFERENCE ON COMMUNICATION TECHNOLOGIES (GCCT), 2015, : 586 - 591
  • [50] An algorithm for mining constrained maximal frequent itemset in uncertain data
    Du, Haizhou
    Journal of Information and Computational Science, 2012, 9 (15): : 4509 - 4515