Mining top-k high average-utility itemsets based on breadth-first search

被引：0

作者：

Liu, Xuan ^{[1
]}

Chen, Genlang ^{[1
]}

Wu, Fangyu ^{[2
]}

Wen, Shiting ^{[1
]}

Zuo, Wanli ^{[3
]}

机构：

[1] NingboTech Univ, Sch Comp & Data Engn, Ningbo 315100, Zhejiang, Peoples R China

[2] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Suzhou 215000, Jiangsu, Peoples R China

[3] Ningbo Univ, Sch Mech Engn & Mech, Ningbo 315211, Zhejiang, Peoples R China

来源：

APPLIED INTELLIGENCE | 2023年 / 53卷 / 23期

关键词：

Top-k high average-utility itemsets; Breadth-first search; High average-utility itemset; Data mining; EFFICIENT ALGORITHM; PATTERNS; STREAM;

D O I：

10.1007/s10489-023-05076-4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

High average-utility itemset mining is a subfield of data mining that has extensive practical applications. However, it is difficult for users to determine a proper minimum threshold because they cannot accurately predict the number of patterns mined at a given threshold. To address this issue, top-k high average-utility itemset mining has been proposed where k is the number of high average-utility itemsets to be mined. In this paper, we design an effective algorithm (named ETAUIM) for finding top-k high average-utility itemsets. ETAUIM employs a breadth-first search strategy to efficiently explore the search space, and it utilizes a tighter upper bound instead of the average-utility upper bound to limit the search space. Additionally, ETAUIM removes irrelevant items during the mining process and utilizes an early abandoning strategy to terminate unnecessary join operations in advance. To evaluate the proposed algorithm, extensive experiments were conducted on six sparse datasets and two dense datasets. Four state-of-the-art algorithms were used for comparison. The experimental results show that ETAUIM has excellent performance and scalability. Moreover, ETAUIM always performs better for sparse datasets.

引用

页码：29319 / 29337

页数：19

共 56 条

[1] Agrawal R., 1994, P 20 INT C VER LARG
[2] [Anonymous], 2012, P 18 ACM SIGKDD INT, DOI DOI 10.1145/2339530.2339546
[3] TKN: An efficient approach for discovering top-k high utility itemsets with positive or negative profits
Ashraf, Mohamed
Abdelkader, Tamer
Rady, Sherine
Gharib, Tarek F.
[J]. INFORMATION SCIENCES, 2022, 587 : 654 - 678
[4] ETKDS: An efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model
Cheng, Haodong
Han, Meng
Zhang, Ni
Wang, Le
Li, Xiaojuan
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (02) : 3317 - 3338
[5] Emerging topic detection in twitter stream based on high utility pattern mining
Choi, Hyeok-Jun
Park, Cheong Hee
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 115 : 27 - 36
[6] Mining top-k high-utility itemsets from a data stream under sliding window model
Dawar, Siddharth
Sharma, Veronica
Goyal, Vikram
[J]. APPLIED INTELLIGENCE, 2017, 47 (04) : 1240 - 1255
[7] The SPMF Open-Source Data Mining Library Version 2
Fournier-Viger, Philippe
Lin, Jerry Chun-Wei
Gomariz, Antonio
Gueniche, Ted
Soltani, Azadeh
Deng, Zhihong
Hoang Thanh Lam
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2016, PT III, 2016, 9853 : 36 - 40
[8] TopHUI: Top-k high-utility itemset mining with negative utility
Gan, Wensheng
Wan, Shicheng
Chen, Jiahui
Chen, Chien-Ming
Qiu, Lina
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5350 - 5359
[9] A Survey of Utility-Oriented Pattern Mining
Gan, Wensheng
Lin, Jerry Chun-Wei
Fournier-Viger, Philippe
Chao, Han-Chieh
Tseng, Vincent S.
Yu, Philip S.
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (04) : 1306 - 1327
[10] Fast algorithms for frequent itemset mining using FP-trees
Grahne, G
Zhu, JF
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (10) : 1347 - 1362

← 1 2 3 4 5 6 →