Mining Top-k High On-shelf Utility Itemsets Using Novel Threshold Raising Strategies

被引:0
作者
Singh, Kuldeep [1 ]
Biswas, Bhaskar [2 ]
机构
[1] Univ Delhi, Dept Comp Sci, Delhi 110007, India
[2] Indian Inst Tech BHU, Dept Comp Sci & Engn, Varanasi 221005, Uttar Pradesh, India
关键词
High on-shelf utility itemsets; top-k itemsets; utility mining; on-shelf time periods; negative utility; branch & bound; EFFICIENT ALGORITHM; PATTERNS;
D O I
10.1145/3645115
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High utility itemsets (HUIs) mining is an emerging area of data mining which discovers sets of items generating a high profit from transactional datasets. In recent years, several algorithms have been proposed for this task. However, most of them do not consider the on-shelf time period of items and negative utility of items. High on-shelf utility itemset (HOUIs) mining is more difficult than traditional HUIs mining because it deals with on-shelf-based time period and negative utility of items. Moreover, most algorithms need minimum utility threshold (min_util) to find rules. However, specifying the appropriate min_util threshold is a difficult problem for users. A smaller min_util threshold may generate too many rules and a higher one may generate a few rules, which can degrade performance. To address these issues, a novel top-k HOUIs mining algorithm named TKOS (Top-K high On-Shelf utility itemsets miner) is proposed which considers on-shelf time period and negative utility. TKOS presents a novel branch and bound-based strategy to raise the internal min_util threshold efficiently. It also presents two pruning strategies to speed up the mining process. In order to reduce the dataset scanning cost, we utilize transaction merging and dataset projection techniques. Extensive experiments have been conducted on real and synthetic datasets having various characteristics. Experimental results show that the proposed algorithm outperforms the state-of-the-art algorithms. The proposed algorithm is up to 42 times faster and uses up-to 19 times less memory compared to the state-of-the-art KOSHU. Moreover, the proposed algorithm has excellent scalability in terms of time periods and the number of transactions.
引用
收藏
页数:23
相关论文
共 38 条
  • [1] Agrawal Rakesh, 1994, PROC 20 INT C VERY L, P487, DOI DOI 10.5555/645920.672836
  • [2] HUC-Prune: an efficient candidate pruning technique to mine high utility patterns
    Ahmed, Chowdhury Farhan
    Tanbeer, Syed Khairuzzaman
    Jeong, Byeong-Soo
    Lee, Young-Koo
    [J]. APPLIED INTELLIGENCE, 2011, 34 (02) : 181 - 198
  • [3] Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases
    Ahmed, Chowdhury Farhan
    Tanbeer, Syed Khairuzzaman
    Jeong, Byeong-Soo
    Lee, Young-Koo
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) : 1708 - 1721
  • [4] Chan R, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P19
  • [5] An efficient algorithm for mining high utility itemsets with negative item values in large databases
    Chu, Chun-Jung
    Tseng, Vincent S.
    Liang, Tyne
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2009, 215 (02) : 767 - 778
  • [6] The SPMF Open-Source Data Mining Library Version 2
    Fournier-Viger, Philippe
    Lin, Jerry Chun-Wei
    Gomariz, Antonio
    Gueniche, Ted
    Soltani, Azadeh
    Deng, Zhihong
    Hoang Thanh Lam
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2016, PT III, 2016, 9853 : 36 - 40
  • [7] FOSHU: Faster On-Shelf High Utility Itemset Mining - with or without Negative Unit Profit
    Fournier-Viger, Philippe
    Zida, Souleymane
    [J]. 30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 857 - 864
  • [8] Fournier-Viger Philippe, 2014, FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning
  • [9] Han JW, 2000, SIGMOD RECORD, V29, P1
  • [10] Efficiently mining high utility itemsets with negative unit profits
    Krishnamoorthy, Srikumar
    [J]. KNOWLEDGE-BASED SYSTEMS, 2018, 145 : 1 - 14