Extracting share frequent itemsets with infrequent subsets

被引:51
作者
Barber, B [1 ]
Hamilton, HJ [1 ]
机构
[1] Univ Regina, Dept Comp Sci, Regina, SK S4S 0A2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
frequent itemsets; share measure; share frequent itemsets; heuristic data mining; quantitative itemsets; association rules;
D O I
10.1023/A:1022419032620
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Itemset share has been proposed as an additional measure of the importance of itemsets in association rule mining (Carter et al., 1997). We compare the share and support measures to illustrate that the share measure can provide useful information about numerical values that are typically associated with transaction items, which the support measure cannot. We define the problem of finding share frequent itemsets, and show that share frequency does not have the property of downward closure when it is defined in terms of the itemset as a whole. We present algorithms that do not rely on the property of downward closure, and thus are able to find share frequent itemsets that have infrequent subsets. The algorithms use heuristic methods to generate candidate itemsets. They supplement the information contained in the set of frequent itemsets from a previous pass, with other information that is available at no additional processing cost. They count only those generated itemsets that are predicted to be frequent. The algorithms are applied to a large commercial database and their effectiveness is examined using principles of classifier evaluation from machine learning.
引用
收藏
页码:153 / 185
页数:33
相关论文
共 36 条
[1]  
AGRAWAL A, 1994, P 20 INT C VER LARG, P487
[2]   Parallel mining of association rules [J].
Agrawal, R ;
Shafer, JC .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) :962-969
[3]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[4]  
Agrawal R., 1996, Advances in Knowledge Discovery and Data Mining, P307
[5]  
Ali K., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P115
[6]  
[Anonymous], 1994, SIGIR
[7]   Parametric algorithms for mining share frequent itemsets [J].
Barber, B ;
Hamilton, HJ .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2001, 16 (03) :277-293
[8]  
Barber B, 2000, LECT NOTES COMPUT<D>, V1910, P316
[9]   Constraint-based rule mining in large, dense databases [J].
Bayardo, RJ ;
Agrawal, R ;
Gunopulos, D .
15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, :188-197
[10]  
Brin S., 1997, SIGMOD Record, V26, P255, DOI [10.1145/253262.253327, 10.1145/253262.253325]