Improving classification models with taxonomy information

被引:25
作者
Cagliero, Luca [1 ]
Garza, Paolo [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, I-10129 Turin, Italy
关键词
Data mining; Classification; Taxonomies; Generalized association rules; ALGORITHM;
D O I
10.1016/j.datak.2013.01.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification is an established data mining problem that has largely been investigated by the research community. Since the raw data is commonly unsuitable for training a classifier as it is, several preprocessing steps are commonly integrated in the data mining and knowledge discovery process before applying classification. This paper investigates the usefulness of integrating taxonomy information into classifier construction. In particular, it presents a general-purpose strategy to improve structured data classification accuracy by enriching data with semantics-based knowledge provided by a taxonomy (i.e., a set of is-a hierarchies) built over data items. The proposed approach may be deemed particularly useful by experts who could directly access or easily infer meaningful taxonomy models over the analyzed data. To demonstrate the benefit obtained from utilizing taxonomies for contemporary classification methods, we also presented a generalized version of a state-of-the-art associative classifier, which also includes generalized (high level) rules in the classification model. Experiments show the effectiveness of the proposed approach in improving the accuracy of state-of-art classifiers, associative and not. (c) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:85 / 101
页数:17
相关论文
共 46 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[3]  
Baralis E., 2010, 2010 5th IEEE International Conference Intelligent Systems (IS), P102, DOI 10.1109/IS.2010.5548348
[4]  
Baralis E., 2010, KNOWL INF SYST, P1
[5]  
Baralis E., 2012, IEEE T KNOWL DATA EN, V99, P1, DOI DOI 10.1109/11(DE.2012.197
[6]   A lazy approach to associative classification [J].
Baralis, Elena ;
Chiusano, Silvia ;
Garza, Paolo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (02) :156-171
[7]   Generalized association rule mining with constraints [J].
Baralis, Elena ;
Cagliero, Luca ;
Cerquitelli, Tania ;
Garza, Paolo .
INFORMATION SCIENCES, 2012, 194 :68-84
[8]   Mining Flipping Correlations from Large Datasets with Taxonomies [J].
Barsky, Marina ;
Kim, Sangkyum ;
Weninger, Tim ;
Han, Jiawei .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 5 (04) :370-381
[9]  
Cagliero L., INTELLIGENT IN PRESS, V17
[10]  
Cagliero L, ACM T INTEL IN PRESS