USING INSTANCE CLONING TO IMPROVE NAIVE BAYES FOR RANKING

被引:15
作者
Jiang, Liangxiao [1 ]
Wang, Dianhong [2 ]
Zhang, Harry [1 ]
Cai, Zhihua [1 ]
Huang, Bo [1 ]
机构
[1] China Univ Geosci, Fac Comp Sci, Wuhan 430074, Hubei, Peoples R China
[2] China Univ Geosci, Fac Elect Engn, Wuhan 430074, Hubei, Peoples R China
关键词
Naive Bayes; instance cloning; ranking; classification; lazy learning; similarity; data mining;
D O I
10.1142/S0218001408006703
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Improving naive Bayes (simply NB)(15,28) for classification has received significant attention. Related work can be broadly divided into two approaches: eager learning and lazy learning.(1) Different from eager learning, the key idea for extending naive Bayes using lazy learning is to learn an improved naive Bayes for each test instance. In recent years, several lazy extensions of naive Bayes have been proposed. For example, LBR,(30) SNNB,(27) and LWNB.(8) All these algorithms aim to improve naive Bayes' classification performance. Indeed, they achieve significant improvement in terms of classification, measured by accuracy. In many real-world data mining applications, however, an accurate ranking is more desirable than an accurate classification. Thus a natural question is whether they also achieve significant improvement in terms of ranking, measured by AUC (the area under the ROC curve).(2,11,17) Responding to this question, we conduct experiments on the 36 UCI data sets(18) selected by Weka(12) to investigate their ranking performance and find that they do not significantly improve the ranking performance of naive Bayes. Aiming at scaling up naive Bayes' ranking performance, we present a novel lazy method ICNB (instance cloned naive Bayes) and develop three ICNB algorithms using different instance cloning strategies. We empirically compare them with naive Bayes. The experimental results show that our algorithms achieve significant improvement in terms of AUC. Our research provides a simple but effective method for the applications where an accurate ranking is desirable.
引用
收藏
页码:1121 / 1140
页数:20
相关论文
共 29 条
[1]  
Aha D., 1997, LAZY LEARNING
[2]  
[Anonymous], 1988, PROBABILISTIC REASON, DOI DOI 10.1016/C2009-0-27609-4
[3]  
[Anonymous], P C UNC ART INT
[4]  
[Anonymous], LEARNING DATA ARTIFI
[5]  
[Anonymous], 1996, P 2 INT C KNOWLEDGE
[6]  
BENNETT PN, 2000, CMUCS100155
[7]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[8]  
Burges C., 2005, Proceedings of the 22Nd International Conference on Machine Learning, ICML'05, P89, DOI DOI 10.1145/1102351.1102363
[9]   Learning to order things [J].
Cohen, WW ;
Schapire, RE ;
Singer, Y .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1999, 10 :243-270
[10]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130