Mining Top-k High On-shelf Utility Itemsets Using Novel Threshold Raising Strategies

被引：0

作者：

Singh, Kuldeep ^{[1
]}

Biswas, Bhaskar ^{[2
]}

机构：

[1] Univ Delhi, Dept Comp Sci, Delhi 110007, India

[2] Indian Inst Tech BHU, Dept Comp Sci & Engn, Varanasi 221005, Uttar Pradesh, India

来源：

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA | 2024年 / 18卷 / 05期

关键词：

High on-shelf utility itemsets; top-k itemsets; utility mining; on-shelf time periods; negative utility; branch & bound; EFFICIENT ALGORITHM; PATTERNS;

D O I：

10.1145/3645115

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

High utility itemsets (HUIs) mining is an emerging area of data mining which discovers sets of items generating a high profit from transactional datasets. In recent years, several algorithms have been proposed for this task. However, most of them do not consider the on-shelf time period of items and negative utility of items. High on-shelf utility itemset (HOUIs) mining is more difficult than traditional HUIs mining because it deals with on-shelf-based time period and negative utility of items. Moreover, most algorithms need minimum utility threshold (min_util) to find rules. However, specifying the appropriate min_util threshold is a difficult problem for users. A smaller min_util threshold may generate too many rules and a higher one may generate a few rules, which can degrade performance. To address these issues, a novel top-k HOUIs mining algorithm named TKOS (Top-K high On-Shelf utility itemsets miner) is proposed which considers on-shelf time period and negative utility. TKOS presents a novel branch and bound-based strategy to raise the internal min_util threshold efficiently. It also presents two pruning strategies to speed up the mining process. In order to reduce the dataset scanning cost, we utilize transaction merging and dataset projection techniques. Extensive experiments have been conducted on real and synthetic datasets having various characteristics. Experimental results show that the proposed algorithm outperforms the state-of-the-art algorithms. The proposed algorithm is up to 42 times faster and uses up-to 19 times less memory compared to the state-of-the-art KOSHU. Moreover, the proposed algorithm has excellent scalability in terms of time periods and the number of transactions.

引用

页数：23

共 38 条

[1] Agrawal Rakesh, 1994, PROC 20 INT C VERY L, P487, DOI DOI 10.5555/645920.672836
[2] HUC-Prune: an efficient candidate pruning technique to mine high utility patterns
Ahmed, Chowdhury Farhan
Tanbeer, Syed Khairuzzaman
Jeong, Byeong-Soo
Lee, Young-Koo
[J]. APPLIED INTELLIGENCE, 2011, 34 (02) : 181 - 198
[3] Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases
Ahmed, Chowdhury Farhan
Tanbeer, Syed Khairuzzaman
Jeong, Byeong-Soo
Lee, Young-Koo
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) : 1708 - 1721
[4] Chan R, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P19
[5] An efficient algorithm for mining high utility itemsets with negative item values in large databases
Chu, Chun-Jung
Tseng, Vincent S.
Liang, Tyne
[J]. APPLIED MATHEMATICS AND COMPUTATION, 2009, 215 (02) : 767 - 778
[6] The SPMF Open-Source Data Mining Library Version 2
Fournier-Viger, Philippe
Lin, Jerry Chun-Wei
Gomariz, Antonio
Gueniche, Ted
Soltani, Azadeh
Deng, Zhihong
Hoang Thanh Lam
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2016, PT III, 2016, 9853 : 36 - 40
[7] FOSHU: Faster On-Shelf High Utility Itemset Mining - with or without Negative Unit Profit
Fournier-Viger, Philippe
Zida, Souleymane
[J]. 30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 857 - 864
[8] Fournier-Viger Philippe, 2014, FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning
[9] Han JW, 2000, SIGMOD RECORD, V29, P1
[10] Efficiently mining high utility itemsets with negative unit profits
Krishnamoorthy, Srikumar
[J]. KNOWLEDGE-BASED SYSTEMS, 2018, 145 : 1 - 14

← 1 2 3 4 →