Expressive generalized itemsets

被引:13
作者
Baralis, Elena [1 ]
Cagliero, Luca [1 ]
Cerquitelli, Tania [1 ]
D'Elia, Vincenzo [1 ]
Garza, Paolo [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Inform, I-10129 Turin, Italy
关键词
Generalized itemset mining; Data Mining; Expressiveness of generalized itemset; ASSOCIATION RULES; FREQUENT PATTERNS;
D O I
10.1016/j.ins.2014.03.056
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generalized itemset mining is a powerful tool to discover multiple-level correlations among the analyzed data. A taxonomy is used to aggregate data items into higher-level concepts and to discover frequent recurrences among data items at different granularity levels. However, since traditional high-level itemsets may also represent the knowledge covered by their lower-level frequent descendant itemsets, the expressiveness of high-level itemsets can be rather limited. To overcome this issue, this article proposes two novel itemset types, called Expressive Generalized Itemset (EGI) and Maximal Expressive Generalized Itemset (Max-EGI), in which the frequency of occurrence of a high-level itemset is evaluated only on the portion of data not yet covered by any of its frequent descendants. Specifically, EGI s represent, at a high level of abstraction, the knowledge associated with sets of infrequent itemsets, while Max-EGIs compactly represent all the infrequent descendants of a generalized itemset. Furthermore, we also propose an algorithm to discover Max-EGIs at the top of the traditionally mined itemsets. Experiments, performed on both real and synthetic datasets, demonstrate the effectiveness, efficiency, and scalability of the proposed approach. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:327 / 343
页数:17
相关论文
共 34 条
[1]  
Agrawal R., 1994, P 20 INT C VER LARG, P487, DOI DOI 10.5555/645920.672836
[2]  
Agrawal Rakesh., 1993, P 1993 ACM SIGMOD IN, P207, DOI DOI 10.1145/170035.170072
[3]  
Baralis E., 2010, 2010 5th IEEE International Conference Intelligent Systems (IS), P102, DOI 10.1109/IS.2010.5548348
[4]   Generalized association rule mining with constraints [J].
Baralis, Elena ;
Cagliero, Luca ;
Cerquitelli, Tania ;
Garza, Paolo .
INFORMATION SCIENCES, 2012, 194 :68-84
[5]   Mining Flipping Correlations from Large Datasets with Taxonomies [J].
Barsky, Marina ;
Kim, Sangkyum ;
Weninger, Tim ;
Han, Jiawei .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 5 (04) :370-381
[6]  
Bringmann B, 2007, IEEE DATA MINING, P63, DOI 10.1109/ICDM.2007.85
[7]   Misleading Generalized Itemset discovery [J].
Cagliero, Luca ;
Cerquitelli, Tania ;
Garza, Paolo ;
Grimaudo, Luigi .
EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (04) :1400-1410
[8]   Itemset generalization with cardinality-based constraints [J].
Cagliero, Luca ;
Garza, Paolo .
INFORMATION SCIENCES, 2013, 244 :161-174
[9]   Discovering Temporal Change Patterns in the Presence of Taxonomies [J].
Cagliero, Luca .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (03) :541-555
[10]  
Calders T., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P74