Comparing the effectiveness of several modeling methods for fault prediction

被引:78
作者
Weyuker, Elaine J. [1 ]
Ostrand, Thomas J. [1 ]
Bell, Robert M. [1 ]
机构
[1] AT&T Labs Res, Florham Pk, NJ 07932 USA
关键词
Empirical study; Fault prediction; Negative binomial; Recursive partitioning; Random forests; Bayesian trees; Fault-percentile-average;
D O I
10.1007/s10664-009-9111-2
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We compare the effectiveness of four modeling methods-negative binomial regression, recursive partitioning, random forests and Bayesian additive regression trees-for predicting the files likely to contain the most faults for 28 to 35 releases of three large industrial software systems. Predictor variables included lines of code, file age, faults in the previous release, changes in the previous two releases, and programming language. To compare the effectiveness of the different models, we use two metrics-the percent of faults contained in the top 20% of files identified by the model, and a new, more general metric, the fault-percentile-average. The negative binomial regression and random forests models performed significantly better than recursive partitioning and Bayesian additive regression trees, as assessed by either of the metrics. For each of the three systems, the negative binomial and random forests models identified 20% of the files in each release that contained an average of 76% to 94% of the faults.
引用
收藏
页码:277 / 295
页数:19
相关论文
共 30 条
[1]   OPTIMIZING PREVENTIVE SERVICE OF SOFTWARE PRODUCTS [J].
ADAMS, EN .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1984, 28 (01) :2-14
[2]  
[Anonymous], 2002, P 2002 ACM SIGSOFT I
[3]  
ARISHOLM E, 2006, P ACM IEEE ISESE RIO
[4]   SOFTWARE ERRORS AND COMPLEXITY - AN EMPIRICAL-INVESTIGATION [J].
BASILI, VR ;
PERRICONE, BT .
COMMUNICATIONS OF THE ACM, 1984, 27 (01) :42-52
[5]  
Bell R., 2006, Proc. 2006 International Symposium on Software Testing and Analysis, P61, DOI DOI 10.1145/1146238.1146246
[6]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
CHIPMAN HA, 2008, BART BAYESIAN ADDITI
[9]  
DENARO G, 2002, P INT C SOFTW ENG IC
[10]   Does code decay? Assessing the evidence from change management data [J].
Eick, SG ;
Graves, TL ;
Karr, AF ;
Marron, JS ;
Mockus, A .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2001, 27 (01) :1-12