Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers

被引:129
作者
Dejaeger, Karel [1 ]
Verbraken, Thomas [1 ]
Baesens, Bart [1 ,2 ]
机构
[1] Katholieke Univ Leuven, Fac Business & Econ, Dept Decis Sci & Informat Management, B-3000 Louvain, Belgium
[2] Univ Southampton, Sch Management, Highfield Southampton SO17 1BJ, England
关键词
Software fault prediction; Bayesian networks; classification; comprehensibility; DEFECT PREDICTION; METRICS; ALGORITHMS; DISCOVERY; CRITIQUE; INDUCTION; SELECTION; CODE;
D O I
10.1109/TSE.2012.20
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software testing is a crucial activity during software development and fault prediction models assist practitioners herein by providing an upfront identification of faulty software code by drawing upon the machine learning literature. While especially the Naive Bayes classifier is often applied in this regard, citing predictive performance and comprehensibility as its major strengths, a number of alternative Bayesian algorithms that boost the possibility of constructing simpler networks with fewer nodes and arcs remain unexplored. This study contributes to the literature by considering 15 different Bayesian Network (BN) classifiers and comparing them to other popular machine learning techniques. Furthermore, the applicability of the Markov blanket principle for feature selection, which is a natural extension to BN theory, is investigated. The results, both in terms of the AUC and the recently introduced H-measure, are rigorously tested using the statistical framework of Demsar. It is concluded that simple and comprehensible networks with less nodes can be constructed using BN classifiers other than the Naive Bayes classifier. Furthermore, it is found that the aspects of comprehensibility and predictive performance need to be balanced out, and also the development context is an item which should be taken into account during model selection.
引用
收藏
页码:237 / 257
页数:21
相关论文
共 109 条
[1]   On learning algorithm selection for classification [J].
Ali, S ;
Smith, KA .
APPLIED SOFT COMPUTING, 2006, 6 (02) :119-138
[2]  
Aliferis C.F., 2003, P AMIA ANN S
[3]  
Aliferis CF, 2010, J MACH LEARN RES, V11, P171
[4]  
[Anonymous], 2002, INT C SYST BIOL
[5]  
[Anonymous], 2014, SOFTWARE METRICS RIG
[6]  
[Anonymous], 2006, Introduction to Data Mining
[7]  
[Anonymous], THESIS U TOLEDO
[8]  
[Anonymous], 2007, 3 INT WORKSH PRED MO
[9]  
[Anonymous], 1973, Pattern Classification and Scene Analysis
[10]  
[Anonymous], P 15 INT S SOFTW REL