RIB: A Robust Itemset-based Bayesian approach to classification

被引:8
作者
Baralis, Elena [1 ]
Cagliero, Luca [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, I-10129 Turin, Italy
关键词
Data mining; Frequent itemset mining; Classification; Bayesian modeling; Noisy data; NAIVE BAYES; NOISE; ENSEMBLES;
D O I
10.1016/j.knosys.2014.08.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-life data is often affected by noise. To cope with this issue, classification techniques robust to noisy data are needed. Bayesian approaches are known to be fairly robust to noise. However, to compute probability estimates state-of-the-art Bayesian approaches adopt a lazy pattern-based strategy, which shows some limitations when coping data affected by a notable amount of noise. This paper proposes RIB (Robust Itemset-based Bayesian classifier), a novel eager and pattern-based Bayesian classifier which discovers frequent itemsets from training data and exploits them to build accurate probability estimates. Enforcing a minimum frequency of occurrence on the considered itemsets reduces the sensitivity of the probability estimates to noise. Furthermore, learning a Bayesian Network that also considers high-order dependences among data usually neglected by traditional Bayesian approaches appears to be more robust to noise and data overfitting than selecting a small subset of patterns tailored to each test instance. The experiments demonstrate that RIB is, on average, more accurate than most state-of-the-art classifiers, Bayesian and not, on benchmark datasets in which different kinds and levels of noise are injected. Furthermore, its performance on the same datasets prior to noise injection is competitive with that of state-of-the-art classifiers. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:366 / 375
页数:10
相关论文
共 53 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]  
Angluin D., 1988, Machine Learning, V2, P343, DOI 10.1023/A:1022873112823
[3]  
[Anonymous], 2014, C4. 5: programs for machine learning
[4]   A lazy approach to associative classification [J].
Baralis, Elena ;
Chiusano, Silvia ;
Garza, Paolo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (02) :156-171
[5]   EnBay: A Novel Pattern-Based Bayesian Classifier [J].
Baralis, Elena ;
Cagliero, Luca ;
Garza, Paolo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (12) :2780-2795
[6]   Generalized association rule mining with constraints [J].
Baralis, Elena ;
Cagliero, Luca ;
Cerquitelli, Tania ;
Garza, Paolo .
INFORMATION SCIENCES, 2012, 194 :68-84
[7]   Bayesian statistics for parasitologists [J].
Basáñez, MG ;
Marshall, C ;
Carabin, N ;
Gyorkos, T ;
Joseph, L .
TRENDS IN PARASITOLOGY, 2004, 20 (02) :85-91
[8]   Itemset generalization with cardinality-based constraints [J].
Cagliero, Luca ;
Garza, Paolo .
INFORMATION SCIENCES, 2013, 244 :161-174
[9]   Improving classification models with taxonomy information [J].
Cagliero, Luca ;
Garza, Paolo .
DATA & KNOWLEDGE ENGINEERING, 2013, 86 :85-101
[10]   Discovering Temporal Change Patterns in the Presence of Taxonomies [J].
Cagliero, Luca .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (03) :541-555