RIB: A Robust Itemset-based Bayesian approach to classification

被引:8
作者
Baralis, Elena [1 ]
Cagliero, Luca [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, I-10129 Turin, Italy
关键词
Data mining; Frequent itemset mining; Classification; Bayesian modeling; Noisy data; NAIVE BAYES; NOISE; ENSEMBLES;
D O I
10.1016/j.knosys.2014.08.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-life data is often affected by noise. To cope with this issue, classification techniques robust to noisy data are needed. Bayesian approaches are known to be fairly robust to noise. However, to compute probability estimates state-of-the-art Bayesian approaches adopt a lazy pattern-based strategy, which shows some limitations when coping data affected by a notable amount of noise. This paper proposes RIB (Robust Itemset-based Bayesian classifier), a novel eager and pattern-based Bayesian classifier which discovers frequent itemsets from training data and exploits them to build accurate probability estimates. Enforcing a minimum frequency of occurrence on the considered itemsets reduces the sensitivity of the probability estimates to noise. Furthermore, learning a Bayesian Network that also considers high-order dependences among data usually neglected by traditional Bayesian approaches appears to be more robust to noise and data overfitting than selecting a small subset of patterns tailored to each test instance. The experiments demonstrate that RIB is, on average, more accurate than most state-of-the-art classifiers, Bayesian and not, on benchmark datasets in which different kinds and levels of noise are injected. Furthermore, its performance on the same datasets prior to noise injection is competitive with that of state-of-the-art classifiers. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:366 / 375
页数:10
相关论文
共 53 条
[41]   Analyzing the presence of noise in multi-class problems: alleviating its influence with the One-vs-One decomposition [J].
Saez, Jose A. ;
Galar, Mikel ;
Luengo, Julian ;
Herrera, Francisco .
KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 38 (01) :179-206
[42]  
Smith MR, 2011, 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), P2690, DOI 10.1109/IJCNN.2011.6033571
[43]  
Srikant R., VLDB, P407
[44]  
Su J., ICML 06 P 23 INT C M, P897
[45]  
TAN P, 2000, KDD 2000 WORKSH POST
[46]  
Tan P.N., 2016, Introduction to Data Mining
[47]   Support vector machine classification with noisy data: a second order cone programming approach [J].
Trafalis, Theodore B. ;
Alwazzi, Samir A. .
INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2010, 39 (07) :757-781
[48]   Not so naive Bayes: Aggregating one-dependence estimators [J].
Webb, GI ;
Boughton, JR ;
Wang, ZH .
MACHINE LEARNING, 2005, 58 (01) :5-24
[49]  
Witten I. H., 2005, DATA MINING, V2, P403
[50]  
Zeng XC, 2008, J INTELL SYST, V17, P331, DOI 10.1515/JISYS.2008.17.4.331