Mining frequent itemsets with convertible constraints

被引:127
作者
Pei, J [1 ]
Han, JW [1 ]
Lakshmanan, LVS [1 ]
机构
[1] Simon Fraser Univ, Burnaby, BC V5A 1S6, Canada
来源
17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS | 2001年
关键词
D O I
10.1109/ICDE.2001.914856
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent work has highlighted the importance of the constraint-based mining paradigm in the context of frequent itemsets, associations, correlations, sequential patterns, and many other interesting patterns in large databases. In this paper Mle study constraints which cannot be handled with existing theory and techniques. For example, avg(S) theta v, median(S) theta v, sum(S) theta v (S can contain items of arbitrary values) (theta is an element of {greater than or equal to, less than or equal to}), are customarily regarded as "tough" constraints in that they cannot be pushed inside an algorithm such as Apriori. We develop a notion of convertible constraints and systematically analyze, classify, and characterize this class. We also develop techniques which enable them to be readily pushed deep inside the recently developed FP-growth algorithm for frequent itemset mining. Results from our detailed experiments show the effectiveness of the techniques developed.
引用
收藏
页码:433 / 442
页数:10
相关论文
共 10 条
  • [1] Agrawal R., 1994, P 20 INT C VER LARG, V1215, P487
  • [2] [Anonymous], P ACM SIGMOD 98
  • [3] BAYARDO RJ, 1999, P 1999 INT C DAT ENG
  • [4] Brin S., 1997, P 1997 ACM SIGMOD IN, P265, DOI DOI 10.1145/253262.253327
  • [5] Grahne G., 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073), P512, DOI 10.1109/ICDE.2000.839450
  • [6] HAN J, 2000, P 2000 ACM SIGMOD IN, P1, DOI DOI 10.1145/342009.335372
  • [7] Lakshmanan LVS, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P157, DOI 10.1145/304181.304196
  • [8] PEI J, 2000, P 6 ACM SIGKDD INT C, P350
  • [9] Pei J., 2000, P ACM SIGMOD WORKSH, P21
  • [10] Srikant R., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P67