A systematic review of machine learning techniques for software fault prediction

被引:374
作者
Malhotra, Ruchika [1 ]
机构
[1] Delhi Technol Univ, Dept Software Engn, Delhi, India
关键词
Machine learning; Software fault proneness; Systematic literature review; STATIC CODE ATTRIBUTES; EMPIRICAL-ANALYSIS; DEFECT-PREDICTION; QUALITY; METRICS; MODELS; CLASSIFICATION; DESIGN;
D O I
10.1016/j.asoc.2014.11.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Background: Software fault prediction is the process of developing models that can be used by the software practitioners in the early phases of software development life cycle for detecting faulty constructs such as modules or classes. There are various machine learning techniques used in the past for predicting faults. Method: In this study we perform a systematic review of studies from January 1991 to October 2013 in the literature that use the machine learning techniques for software fault prediction. We assess the performance capability of the machine learning techniques in existing research for software fault prediction. We also compare the performance of the machine learning techniques with the statistical techniques and other machine learning techniques. Further the strengths and weaknesses of machine learning techniques are summarized. Results: In this paper we have identified 64 primary studies and seven categories of the machine learning techniques. The results prove the prediction capability of the machine learning techniques for classifying module/class as fault prone or not fault prone. The models using the machine learning techniques for estimating software fault proneness outperform the traditional statistical models. Conclusion: Based on the results obtained from the systematic review, we conclude that the machine learning techniques have the ability for predicting software fault proneness and can be used by software practitioners and researchers. However, the application of the machine learning techniques in software fault prediction is still limited and more number of studies should be carried out in order to obtain well formed and generalizable results. We provide future guidelines to practitioners and researchers based on the results obtained in this work. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:504 / 518
页数:15
相关论文
共 139 条
[1]  
Afzal Wasif, 2010, 2010 Proceedings of Second International Symposium on Search Based Software Engineering (SSBSE), P79, DOI 10.1109/SSBSE.2010.19
[2]  
Afzal Wasif, 2008, 2008 The Third International Conference on Software Engineering Advances (ICSEA), P407, DOI 10.1109/ICSEA.2008.9
[3]   On the application of genetic programming for software engineering predictive modeling: A systematic review [J].
Afzal, Wasif ;
Torkar, Richard .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (09) :11984-11997
[4]   Using Faults-Slip-Through Metric As A Predictor of Fault-Proneness [J].
Afzal, Wasif .
17TH ASIA PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2010), 2010, :414-422
[5]   Search-Based Prediction of Fault Count Data [J].
Afzal, Wasif ;
Torkar, Richard ;
Feldt, Robert .
1ST INTERNATIONAL SYMPOSIUM ON SEARCH BASED SOFTWARE ENGINEERING, PROCEEDINGS, 2009, :35-38
[6]   Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: A replicated case study [J].
Aggarwal, K.K. ;
Singh, Yogesh ;
Kaur, Arvinder ;
Malhotra, Ruchika .
Software Process Improvement and Practice, 2009, 14 (01) :39-62
[7]   Empirical Study of Object-Oriented Metrics [J].
Aggarwal, K. K. ;
Singh, Yogesh ;
Kaur, Arvinder ;
Malhotra, Ruchika .
JOURNAL OF OBJECT TECHNOLOGY, 2006, 5 (08) :149-173
[8]  
Ambros M. D., 2003, 7 IEEE WORK C MIN SO, P31
[9]  
[Anonymous], 2005, DATA MINING
[10]  
[Anonymous], 2007, EBSE2007001