Probabilistic Support Prediction: Fast Frequent Itemset Mining in Dense Data

被引:2
|
作者
Sadeequllah, Muhammad [1 ]
Rauf, Azhar [1 ]
Rehman, Saif Ur [1 ]
Alnazzawi, Noha [2 ]
机构
[1] Univ Peshawar, Dept Comp Sci, Peshawar 25120, Pakistan
[2] Yanbu Ind Coll, Comp Sci & Engn Dept, Yanbu 46452, Saudi Arabia
关键词
Association rule; data mining; frequent itemsets mining; support-count approximation; approximate algorithms; transaction databases; EFFICIENT ALGORITHM; PATTERNS;
D O I
10.1109/ACCESS.2024.3376477
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent itemset mining (FIM) is a highly resource-demanding data-mining task fundamental to numerous data-mining applications. Support calculation is a frequently performed computation-intensive operation of FIM algorithms, whereas storing transactional data is memory-intensive. The FIM is even more resource-hungry for dense data than for sparse data. The rapidly growing size of datasets further exacerbates this situation and necessitates the design of out-of-the-box highly efficient solutions. This paper proposes a novel approach to frequent itemset mining for dense datasets. This approach, after the initial stage, does not use transactional data, which makes it memory efficient. It also replaces processing-intensive support calculations with efficient support predictions, which are probabilistic and need no transactional data. To predict the support of an itemset, it only needs the support of its subsets. However, this technique works only for itemsets of size three or higher. We also propose an FIM algorithm ProbBF, which incorporates this technique. The ProbBF discards transactional data after it uses it to calculate frequent one and two-size itemsets. For the itemsets of size k, where k >= 3 , ProbBF uses the proposed probabilistic technique to predict their support. It is considered frequent if the predicted support is greater than a given threshold. Our experiments show that ProbBF is efficient in both time and space against state-of-the-art FIM algorithms that use transactional data. The experiments also show that ProbBF can successfully generate the majority of the frequent itemsets on real-world datasets. Since ProbBF is probabilistic, some loss in quality is inevitable.
引用
收藏
页码:39330 / 39350
页数:21
相关论文
共 50 条
  • [21] A data mining proxy approach for efficient frequent itemset mining
    Yu, Jeffrey Xu
    Li, Zhiheng
    Liu, Guimei
    VLDB JOURNAL, 2008, 17 (04): : 947 - 970
  • [22] An efficient algorithm for frequent itemset mining on data streams
    Xie Zhi-Jun
    Chen Hong
    Li, Cuiping
    ADVANCES IN DATA MINING: APPLICATIONS IN MEDICINE, WEB MINING, MARKETING, IMAGE AND SIGNAL MINING, 2006, 4065 : 474 - 491
  • [23] Anytime Frequent Itemset Mining of Transactional Data Streams
    Goyal, Poonam
    Challa, Jagat Sesh
    Shrivastava, Shivin
    Goyal, Navneet
    BIG DATA RESEARCH, 2020, 21
  • [24] Parallel Incremental Frequent Itemset Mining for Large Data
    Song, Yu-Geng
    Cui, Hui-Min
    Feng, Xiao-Bing
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (02) : 368 - 385
  • [25] Novel algorithm for frequent itemset mining in data warehouses
    Xu L.-J.
    Xie K.-L.
    Journal of Zhejiang University-SCIENCE A, 2006, 7 (2): : 216 - 224
  • [26] A Survey on Closed Frequent Itemset Mining on Data Streams
    Bai, Pavitra . S.
    Kumar, Ravi . G. . K.
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2016, : 542 - 547
  • [27] Probabilistic Frequent Itemset Mining Algorithm over Uncertain Databases with Sampling
    Li, Hai-Feng
    Zhang, Ning
    Zhang, Yue-Jin
    Wang, Yue
    FUZZY SYSTEMS AND DATA MINING II, 2016, 293 : 159 - 166
  • [28] Frequent Itemset Mining in High Dimensional Data: A Review
    Zaki, Fatimah Audah Md
    Zulkurnain, Nurul Fariza
    COMPUTATIONAL SCIENCE AND TECHNOLOGY, 2019, 481 : 325 - 334
  • [29] Efficient Incremental Itemset Tree for Approximate Frequent Itemset Mining On Data Stream
    Bai, Pavitra S.
    Kumar, Ravi G. K.
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2016, : 239 - 242
  • [30] Approximate Frequent Itemset Mining for Streaming Data on FPGA
    Li, Yubin
    Sun, Yuliang
    Dai, Guohao
    Xu, Qiang
    Wang, Yu
    Yang, Huazhong
    2016 26TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2016,