Misleading Generalized Itemset discovery

被引:17
作者
Cagliero, Luca [1 ]
Cerquitelli, Tania [1 ]
Garza, Paolo [1 ]
Grimaudo, Luigi [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, I-10129 Turin, Italy
关键词
Generalized itemset mining; Data mining; Taxonomies; Mobile data analysis; ASSOCIATION RULES; FREQUENT PATTERNS; DATABASE;
D O I
10.1016/j.eswa.2013.08.039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frequent generalized itemset mining is a data mining technique utilized to discover a high-level view of interesting knowledge hidden in the analyzed data. By exploiting a taxonomy, patterns are usually extracted at any level of abstraction. However, some misleading high-level patterns could be included in the mined set. This paper proposes a novel generalized itemset type, namely the Misleading Generalized Itemset (MGI). Each MGI, denoted as X (sic) epsilon, represents a frequent generalized itemset X and its set epsilon of low-level frequent descendants for which the correlation type is in contrast to the one of X. To allow experts to analyze the misleading high-level data correlations separately and exploit such knowledge by making different decisions, MGIs are extracted only if the low-level descendant itemsets that represent contrasting correlations cover almost the same portion of data as the high-level (misleading) ancestor. An algorithm to mine MGIs at the top of traditional generalized itemsets is also proposed. The experiments performed on both real and synthetic datasets demonstrate the effectiveness and efficiency of the proposed approach. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1400 / 1410
页数:11
相关论文
共 33 条
[1]  
Aggarwal C. C., 1998, Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 1998, P18, DOI 10.1145/275487.275490
[2]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3]  
Agrawal R., 1994, P 20 INT C VER LARG, P487, DOI DOI 10.5555/645920.672836
[4]   Mining top-k regular-frequent itemsets using database partitioning and support estimation [J].
Amphawan, Komate ;
Lenca, Philippe ;
Surarerks, Athasit .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (02) :1924-1936
[5]  
[Anonymous], 2010, P 16 ACM SIGKDD INT, DOI DOI 10.1145/1835804.1835843
[6]  
Baralis E., 2010, 2010 5th IEEE International Conference Intelligent Systems (IS), P102, DOI 10.1109/IS.2010.5548348
[7]   Mining Flipping Correlations from Large Datasets with Taxonomies [J].
Barsky, Marina ;
Kim, Sangkyum ;
Weninger, Tim ;
Han, Jiawei .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 5 (04) :370-381
[8]  
Blake C., 2012, UCI REPOSITORY MACHI
[9]  
Brin S., 1997, P 1997 ACM SIGMOD IN, P265
[10]  
Calders T., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P74