Mining high utility itemsets based on the time decaying model

被引：30

作者：

Kim, Donggyu ^{[1
]}

Yun, Unil ^{[1
]}

机构：

[1] Sejong Univ, Dept Comp Engn, Seoul, South Korea

来源：

INTELLIGENT DATA ANALYSIS | 2016年 / 20卷 / 05期

基金：

新加坡国家研究基金会;

关键词：

Data mining; high utility itemset mining; stream data mining; time decaying model; TRANSACTIONAL DATA STREAMS; FREQUENT ITEMSETS; SLIDING-WINDOW; EFFICIENT ALGORITHMS; PATTERNS;

D O I：

10.3233/IDA-160861

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Various association rule mining techniques, such as frequent itemset mining, sequence itemset mining, and high utility itemset mining, have been studied to reveal valuable knowledge hidden from large databases. Among these techniques, high utility itemset mining has been researched actively by many researchers because of its characteristics that can find more meaningful itemsets compared to those of other approaches by considering the utility of each item in a given database. In recent years, mining high utility itemsets over data streams has emerged as an interesting topic because many users want to obtain valuable information from stream data, which are continually generated at rapid rates. However, in these environments, most of the previous high utility itemset mining methods cannot efficiently work in terms of both runtime and memory usage. In addition, since they conduct their mining processes without any consideration of transactions' arrival-time, it is hard for these methods to sufficiently fulfill the needs of users when they want to obtain only up to date, relevant information over data streams. In this paper, we propose a new tree-based algorithm that mines recent high utility itemsets over data streams. On the basis of the time decaying model, our algorithm diminishes the utilities of transactions according to their arrival-time in order to assign larger weights to recent data compared to those of older ones. Moreover, the algorithm regularly updates the utility information in its tree data structure and prunes the nodes with the utility values less than a user-specified minimum value. Thereby, the algorithm can maintain a reasonable memory usage bound by avoiding memory use that is unessential. Experimental results demonstrate that our algorithm can mine recent high utility itemsets from varying stream data while consuming smaller computational resources than those of the existing algorithms.

引用

页码：1157 / 1180

页数：24

共 34 条

[1]

Agrawal R., P 20 INT C VERY LARG

[2] Interactive mining of high utility patterns over data streams [J].