An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes

被引:42
作者
Kumar, Lov [1 ]
Misra, Sanjay [2 ]
Rath, Santanu Ku. [1 ]
机构
[1] Natl Inst Technol, Dept CSE, Rourkela, India
[2] Atilim Univ, Dept Comp Engn, Ankara, Turkey
关键词
Feature selection techniques; Artificial neural network; Ensemble method; Source code metrics; Cost analysis framework; NEURAL-NETWORK; VALIDATION; QUALITY;
D O I
10.1016/j.csi.2017.02.003
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Software fault prediction models are used to predict faulty modules at the very early stage of software development life cycle. Predicting fault proneness using source code metrics is an area that has attracted several researchers' attention. The performance of a model to assess fault proneness depends on the source code metrics which are considered as the input for the model. In this work, we have proposed a framework to validate the source code metrics and identify a suitable set of source code metrics with the aim to reduce irrelevant features and improve the performance of the fault prediction model. Initially, we applied a t-test analysis and univariate logistic regression analysis to each source code metric to evaluate their potential for predicting fault proneness. Next, we performed a correlation analysis and multivariate linear regression stepwise forward selection to find the right set of source code metrics for fault prediction. The obtained set of source code metrics are considered as the input to develop a fault prediction model using a neural network with five different training algorithms and three different ensemble methods. The effectiveness of the developed fault prediction models are evaluated using a proposed cost evaluation framework. We performed experiments on fifty six Open Source Java projects. The experimental results reveal that the model developed by considering the selected set of source code metrics using the suggested source code metrics validation framework as the input achieves better results compared to all other metrics. The experimental results also demonstrate that the fault prediction model is best suitable for projects with faulty classes less than the threshold value depending on fault identification efficiency (low - 48.89%, median- 39.26%, and high - 27.86%).
引用
收藏
页码:1 / 32
页数:32
相关论文
共 65 条
  • [1] An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction
    Abaei, Golnoush
    Selamat, Ali
    Fujita, Hamido
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 74 : 28 - 39
  • [2] Abreu F. B., 1994, P 4 INT C SOFTW QUAL, V186, P1
  • [3] Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: A replicated case study
    Aggarwal, K.K.
    Singh, Yogesh
    Kaur, Arvinder
    Malhotra, Ruchika
    [J]. Software Process Improvement and Practice, 2009, 14 (01): : 39 - 62
  • [4] [Anonymous], 1994, P WORKSH PRAGM THEOR
  • [5] A systematic and comprehensive investigation of methods to build and evaluate fault prediction models
    Arisholm, Erik
    Briand, Lionel C.
    Johannessen, Eivind B.
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (01) : 2 - 17
  • [6] A hierarchical model for object-oriented design quality assessment
    Bansiya, J
    Davis, CG
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (01) : 4 - 17
  • [7] A validation of object-oriented design metrics as quality indicators
    Basili, VR
    Briand, LC
    Melo, WL
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1996, 22 (10) : 751 - 761
  • [8] How reuse influences productivity in object-oriented systems
    Basili, VR
    Briand, LC
    Melo, WL
    [J]. COMMUNICATIONS OF THE ACM, 1996, 39 (10) : 104 - 116
  • [9] 1ST-ORDER AND 2ND-ORDER METHODS FOR LEARNING - BETWEEN STEEPEST DESCENT AND NEWTON METHOD
    BATTITI, R
    [J]. NEURAL COMPUTATION, 1992, 4 (02) : 141 - 166
  • [10] Bieman J. M., 1995, SIGSOFT Software Engineering Notes, P259, DOI 10.1145/223427.211856