Closed High Utility Itemsets Mining over Data Stream Based on Sliding Window Model

被引:0
作者
Cheng H. [1 ]
Han M. [1 ]
Zhang N. [1 ]
Li X. [1 ]
Wang L. [1 ]
机构
[1] College of Computer Science and Engineering, North Minzu University, Yinchuan
来源
Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2021年 / 58卷 / 11期
基金
中国国家自然科学基金;
关键词
Closed high utility itemsets; Data stream mining; Pattern mining; Sliding window; Utility list;
D O I
10.7544/issn1000-1239.2021.20200554
中图分类号
学科分类号
摘要
It is a challenging task to mine high utility itemsets from the data stream, because the incoming data stream must be processed in real time within the constraints of time and storage memory. Data stream mining usually generates a large number of redundant itemsets. In order to reduce the number of these useless itemsets and ensure lossless compression of complete high utility itemsets, it is necessary to mine closed itemsets, which can be several orders of magnitude smaller than the collection of complete high utility itemsets. In order to solve the above problem, a high utility itemsets mining algorithm (sliding-window-model-based closed high utility itemsets mining on data stream, CHUI_DS) is proposed to achieve mining closed high utility itemsets on data stream. A new utility-list structure is designed in CHUI_DS, which is very effective in increasing the speed of batch insertion and deletion. In addition, effective pruning strategies are applied to improve the closed itemset mining process and eliminate potential low-utility candidates. Extensive experimental evaluation of the proposed algorithm on real datasets and synthetic datasets shows the efficiency and feasibility of the algorithm. In terms of speed, it is superior to the previously proposed algorithms that mainly run in batch mode. Moreover, it is suitable for sliding windows of different sizes, and has strong scalability in terms of the number of transactions. © 2021, Science Press. All right reserved.
引用
收藏
页码:2500 / 2514
页数:14
相关论文
共 32 条
  • [1] Liu Ying, Liao Weikeng, Choudhary A., A fast high utility itemsets mining algorithm, Proc of the 1st Int Workshop on Utility-based Data Mining, pp. 90-99, (2005)
  • [2] Agrawal R, Srikant R., Fast algorithms for mining association rules, Proc of the 20th Int Conf on Very Large Data Bases, pp. 487-499, (1994)
  • [3] Tseng V S, Shie B E, Wu Chengwei, Et al., Efficient algorithms for mining high utility itemsets from transactional databases, IEEE Transactions on Knowledge & Data Engineering, 25, 8, pp. 1772-1786, (2013)
  • [4] Dawar S, Goyal V., UP-Hist tree: An efficient data structure for mining high utility patterns from transaction databases, Proc of the 19th Int Database Engineering and Applications Symp, pp. 56-61, (2015)
  • [5] Tseng V S, Wu Chengwei, Shie B E, Et al., UP-Growth: An efficient algorithm for high utility itemset mining, Proc of the 16th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, pp. 253-262, (2010)
  • [6] Yun U, Ryang H, Ryu K H., High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates, Expert Systems with Applications, 41, 8, pp. 3861-3878, (2014)
  • [7] Liu Junqiang, Wang Ke, Fung B C M., Direct discovery of high utility itemsets without candidate generation, Proc of the 12th IEEE Int Conf on Data Mining, pp. 984-989, (2012)
  • [8] Liu Mengchi, Qu Junfeng, Mining high utility itemsets without candidate generation, Proc of the 21st ACM Int Conf on Information and Knowledge Management, pp. 55-64, (2012)
  • [9] Fournier-Viger P, Wu Chengwei, Zida S, Et al., FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning, Proc of the 21st Int Symp on Methodologies for Intelligent Systems, pp. 83-92, (2014)
  • [10] Krishnamoorthy S., Pruning strategies for mining high utility itemsets, Expert Systems with Applications, 42, 5, pp. 2371-2381, (2015)