Empirical validation of feature selection techniques for cross-project defect prediction

被引:4
作者
Malhotra, Ruchika [1 ]
Meena, Shweta [1 ]
机构
[1] Delhi Technol Univ, Dept Software Engn, Delhi 110042, India
关键词
Defect prediction; Cross-project; Feature selection; Filter method; Wrapper method; Swarm search-based techniques; CLASSIFICATION; MODELS;
D O I
10.1007/s13198-023-02051-7
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In software engineering, cross-project defect prediction is an important area in the field of defect prediction and change prediction. Software defect prediction aims for the identification of defects in the early stages of the software development life cycle. Defect prediction helps in optimizing the resources in terms of testing and reduction of maintenance efforts. Defect prediction works appropriately if a huge amount of data is available for training a prediction model. Nowadays, a sufficient amount of data is not available due to which we have to use different data for training and testing of a project. The cross-project defect prediction idea emerged when different projects are used as training and testing dataset for the identification of defects. To design a defect prediction model we have to consider only significant features out of all the features set. Feature selection techniques are categorized based on unsupervised and supervised learning. The major limitation of cross-project defect prediction is handling different data distributions of source and target projects. The experiment was conducted using AEEEM and ReLink software defect dataset. Moreover, five projects of AEEEM and three projects of ReLink with a maximum count of files in the selected projects are 1862 and 399. In this study, we have analyzed the effect of feature selection techniques in cross-project defect prediction. The results were analyzed using AUC. The significance of filter, wrapper, and swarm search-based methods for feature selection techniques was analyzed separately. There is a trade-off between computational complexity and the performance of feature selection techniques. Swarm search-based methods performed better than filter and wrapper methods in terms of computational cost and overall performance of the prediction modes. The results were statistically validated using Friedman test and Wilcoxon signed rank test.
引用
收藏
页码:1743 / 1755
页数:13
相关论文
共 52 条
  • [1] k-best feature selection and ranking via stochastic approximation
    Akman, David V.
    Malekipirbazari, Milad
    Yenice, Zeren D.
    Yeo, Anders
    Adhikari, Niranjan
    Wong, Yong Kai
    Abbasi, Babak
    Gumus, Alev Taskin
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
  • [2] [Anonymous], 2010, P 18 ACM SIGSOFT INT, DOI DOI 10.1145/1882291.1882308
  • [3] [Anonymous], 2012, Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
  • [4] A novel feature selection method for twin support vector machine
    Bai, Lan
    Wang, Zhen
    Shao, Yuan-Hai
    Deng, Nai-Yang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2014, 59 : 1 - 8
  • [5] With-in-project defect prediction using bootstrap aggregation based diverse ensemble learning technique
    Bhutamapuram, Umamaheswara Sharma
    Sadam, Ravichandra
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (10) : 8675 - 8691
  • [6] Optimal Feature Selection through Search-Based Optimizer in Cross Project
    bin Faiz, Rizwan
    Shaheen, Saman
    Sharaf, Mohamed
    Rauf, Hafiz Tayyab
    [J]. ELECTRONICS, 2023, 12 (03)
  • [7] Fair and Balanced? Bias in Bug-Fix Datasets
    Bird, Christian
    Bachmann, Adrian
    Aune, Eirik
    Duffy, John
    Bernstein, Abraham
    Filkov, Vladimir
    Devanbu, Premkumar
    [J]. 7TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2009, : 121 - 130
  • [8] Assessing the applicability of fault-proneness models across object-oriented software projects
    Briand, LC
    Melo, WL
    Wüst, J
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) : 706 - 720
  • [9] STATISTICAL INFERENCE FOR MODEL PARAMETERS IN STOCHASTIC GRADIENT DESCENT
    Chen, Xi
    Lee, Jason D.
    Tong, Xin T.
    Zhang, Yichen
    [J]. ANNALS OF STATISTICS, 2020, 48 (01) : 251 - 273
  • [10] Software defect prediction using relational association rule mining
    Czibula, Gabriela
    Marian, Zsuzsanna
    Czibula, Istvan Gergely
    [J]. INFORMATION SCIENCES, 2014, 264 : 260 - 278