Mining top-k high average-utility itemsets based on breadth-first search

被引:0
作者
Liu, Xuan [1 ]
Chen, Genlang [1 ]
Wu, Fangyu [2 ]
Wen, Shiting [1 ]
Zuo, Wanli [3 ]
机构
[1] NingboTech Univ, Sch Comp & Data Engn, Ningbo 315100, Zhejiang, Peoples R China
[2] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Suzhou 215000, Jiangsu, Peoples R China
[3] Ningbo Univ, Sch Mech Engn & Mech, Ningbo 315211, Zhejiang, Peoples R China
关键词
Top-k high average-utility itemsets; Breadth-first search; High average-utility itemset; Data mining; EFFICIENT ALGORITHM; PATTERNS; STREAM;
D O I
10.1007/s10489-023-05076-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High average-utility itemset mining is a subfield of data mining that has extensive practical applications. However, it is difficult for users to determine a proper minimum threshold because they cannot accurately predict the number of patterns mined at a given threshold. To address this issue, top-k high average-utility itemset mining has been proposed where k is the number of high average-utility itemsets to be mined. In this paper, we design an effective algorithm (named ETAUIM) for finding top-k high average-utility itemsets. ETAUIM employs a breadth-first search strategy to efficiently explore the search space, and it utilizes a tighter upper bound instead of the average-utility upper bound to limit the search space. Additionally, ETAUIM removes irrelevant items during the mining process and utilizes an early abandoning strategy to terminate unnecessary join operations in advance. To evaluate the proposed algorithm, extensive experiments were conducted on six sparse datasets and two dense datasets. Four state-of-the-art algorithms were used for comparison. The experimental results show that ETAUIM has excellent performance and scalability. Moreover, ETAUIM always performs better for sparse datasets.
引用
收藏
页码:29319 / 29337
页数:19
相关论文
共 56 条
  • [1] Agrawal R., 1994, P 20 INT C VER LARG
  • [2] [Anonymous], 2012, P 18 ACM SIGKDD INT, DOI DOI 10.1145/2339530.2339546
  • [3] TKN: An efficient approach for discovering top-k high utility itemsets with positive or negative profits
    Ashraf, Mohamed
    Abdelkader, Tamer
    Rady, Sherine
    Gharib, Tarek F.
    [J]. INFORMATION SCIENCES, 2022, 587 : 654 - 678
  • [4] ETKDS: An efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model
    Cheng, Haodong
    Han, Meng
    Zhang, Ni
    Wang, Le
    Li, Xiaojuan
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (02) : 3317 - 3338
  • [5] Emerging topic detection in twitter stream based on high utility pattern mining
    Choi, Hyeok-Jun
    Park, Cheong Hee
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 115 : 27 - 36
  • [6] Mining top-k high-utility itemsets from a data stream under sliding window model
    Dawar, Siddharth
    Sharma, Veronica
    Goyal, Vikram
    [J]. APPLIED INTELLIGENCE, 2017, 47 (04) : 1240 - 1255
  • [7] The SPMF Open-Source Data Mining Library Version 2
    Fournier-Viger, Philippe
    Lin, Jerry Chun-Wei
    Gomariz, Antonio
    Gueniche, Ted
    Soltani, Azadeh
    Deng, Zhihong
    Hoang Thanh Lam
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2016, PT III, 2016, 9853 : 36 - 40
  • [8] TopHUI: Top-k high-utility itemset mining with negative utility
    Gan, Wensheng
    Wan, Shicheng
    Chen, Jiahui
    Chen, Chien-Ming
    Qiu, Lina
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5350 - 5359
  • [9] A Survey of Utility-Oriented Pattern Mining
    Gan, Wensheng
    Lin, Jerry Chun-Wei
    Fournier-Viger, Philippe
    Chao, Han-Chieh
    Tseng, Vincent S.
    Yu, Philip S.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (04) : 1306 - 1327
  • [10] Fast algorithms for frequent itemset mining using FP-trees
    Grahne, G
    Zhu, JF
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (10) : 1347 - 1362