Mining Flipping Correlations from Large Datasets with Taxonomies

被引:16
作者
Barsky, Marina [1 ]
Kim, Sangkyum [2 ]
Weninger, Tim [2 ]
Han, Jiawei [2 ]
机构
[1] Univ Victoria, Victoria, BC, Canada
[2] Univ Illinois, Champaign, IL USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2011年 / 5卷 / 04期
基金
加拿大自然科学与工程研究理事会; 美国国家科学基金会;
关键词
Flipping correlation; Itemset mining;
D O I
10.14778/2095686.2095695
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we introduce a new type of pattern a flipping correlation pattern. The flipping patterns are obtained from contrasting the correlations between items at different levels of abstraction. They represent surprising correlations, both positive and negative, which are specific for a given abstraction level, and which "flip" from positive to negative and vice versa when items are generalized to a higher level of abstraction. We design an efficient algorithm for finding flipping correlations, the FLIPPER algorithm, which outperforms na ve pattern mining methods by several orders of magnitude. We apply FLIPPER to real-life datasets and show that the discovered patterns are non-redundant, surprising and actionable. FLIPPER finds strong contrasting correlations in itemsets with low-to-medium support, while existing techniques cannot handle the pattern discovery in this frequency range.
引用
收藏
页码:370 / 381
页数:12
相关论文
共 24 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
Antonie ML, 2004, LECT NOTES ARTIF INT, V3202, P27
[3]  
Brin S., 1997, P 1997 ACM SIGMOD IN, P265, DOI [10.1145/253260.253327, DOI 10.1145/253262.253327]
[4]  
Cohen J., 2002, APPL MULTIPLE REGRES
[5]   Implications of probabilistic data modeling for mining association rules [J].
Hahsler, M ;
Hornik, K ;
Reutterer, T .
FROM DATA AND INFORMATION ANALYSIS TO KNOWLEDGE ENGINEERING, 2006, :598-+
[6]  
Hamani M. S., 2009, CIIA, P12
[7]  
Han JW, 2000, SIGMOD RECORD, V29, P1
[8]  
Hilderman R.J., 2001, KNOWLEDGE DISCOVERY
[9]  
Jiawei Han, 1995, VLDB '95. Proceedings of the 21st International Conference on Very Large Data Bases, P420
[10]  
Kohavi R., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P202