Computational aspects of mining maximal frequent patterns

被引:19
作者
Yang, Guizhen [1 ]
机构
[1] SRI Int, Ctr Artificial Intelligence, Menlo Pk, CA 94025 USA
关键词
data mining; complexity; maximal frequent patterns; #P-complete;
D O I
10.1016/j.tcs.2006.05.029
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper we study the complexity-theoretic aspects of mining maximal frequent patterns, from the perspective of counting the number of all distinct solutions. We present the first formal proof that the problem of counting the number of maximal frequent itemsets in a database of transactions, given an arbitrary support threshold, is #P-complete, thereby providing theoretical evidence that the problem of mining maximal frequent itemsets is NP-hard. We also extend our complexity analysis to other similar data mining problems that deal with complex data structures, such as sequences, trees, and graphs. We investigate several variants of these mining problems in which the patterns of interest are subsequences, subtrees, or subgraphs, and show that the associated problems of counting the number of maximal frequent patterns are all either #P-complete or #P-hard. (C) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:63 / 85
页数:23
相关论文
共 50 条
[1]  
AGARWAL RC, 2000, ACM SIGKDD INT C KNO
[2]  
AGRAW R, 1995, INT C DAT ENG ICDE
[3]  
Agrawal R, 1994, P 20 INT C VER LARG, V1215, P487
[4]  
[Anonymous], 1994, COMPUTATIONAL COMPLE
[5]  
ASAI T, 2002, SIAM INT C DAT MIN S
[6]  
BAYARDO RJ, 1998, ACM INT C MAN DAT SI
[7]  
BOROS E, 2002, INT S THEOR ASP COMP
[8]  
BURDICK D, 2001, INT C DAT ENG ICDE
[9]  
Cormen T. H., 2001, Introduction to Algorithms, V2nd
[10]  
ETZIONI O, 2003, ACM SIGKDD INT C KNO