Mining Top-Rank-k Erasable Itemsets by PID_lists

被引:29
作者
Deng, Zhihong [1 ]
机构
[1] Peking Univ, Key Lab Machine Percept, Minist Educ, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
基金
中国国家自然科学基金; 国家高技术研究发展计划(863计划);
关键词
Compendex;
D O I
10.1002/int.21580
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mining erasable itemsets are one of new emerging data mining tasks. In this paper, we present a new data representation called a PID_list, which keeps track of the id_nums (identification number) of products that include an itemset. On the basis of the PID_list, we propose a new algorithm called VM for mining top-rank-k erasable itemsets efficiently. The VM algorithm can avoid the time-consuming process of calculating the gain of the candidate itemsets and lots of scans of the databases. Therefore, it can accelerate the task of mining greatly. For evaluating the VM algorithm, we have conducted experiments on six synthetic product databases. Our performance study shows that the VM algorithm is efficient and much faster than the MIKE algorithm, which is the first algorithm for dealing with the problem of mining top-rank-k erasable itemsets.
引用
收藏
页码:366 / 379
页数:14
相关论文
共 25 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
Agrawal R., P 20 INT C VERY LARG
[3]  
[Anonymous], 2011, Pei. data mining concepts and techniques
[4]  
[Anonymous], 2002, P 8 ACM SIGKDD INT C, DOI DOI 10.1145/775047.775110
[5]  
Bernecker T, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P119
[6]  
Bing Liu, 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P80
[7]  
Brin S., 1997, P 1997 ACM SIGMOD IN, P265
[8]   MAFIA: A maximal frequent itemset algorithm [J].
Burdick, D ;
Calimlim, M ;
Flannick, J ;
Gehrke, J ;
Yiu, TM .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (11) :1490-1504
[9]   Mining frequent itemsets without support threshold: With and without item constraints [J].
Cheung, YL ;
Fu, AWC .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (09) :1052-1069
[10]  
Cong G., 2005, P 2005 ACM SIGMOD IN, P670, DOI DOI 10.1145/1066157.1066234