Software Defect Prediction on Unlabelled Datasets: A Comparative Study

被引:0
|
作者
Ronchieri, Elisabetta [1 ]
Canaparo, Marco [1 ]
Belgiovine, Mauro [2 ]
机构
[1] INFN CNAF, Viale Berti Pichat 6-2, Bologna, Italy
[2] Northeastern Univ, Elettr & Comp Engn Dept, Boston, MA 02115 USA
来源
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT II | 2020年 / 12250卷
关键词
Unlabelled dataset; Defect prediction; Unsupervised methods; Machine learning; QUALITY; TESTS;
D O I
10.1007/978-3-030-58802-1_25
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Defect prediction on unlabelled datasets is a challenging and widespread problem in software engineering. Machine learning is of great value in this context because it provides techniques called unsupervised - that are applicable to unlabelled datasets. Objective: This study aims at comparing various approaches employed over the years on unlabelled datasets to predict the defective modules, i.e. the ones which need more attention in the testing phase. Our comparison is based on the measurement of performance metrics and on the real defective information derived from software archives. Our work leverages a new dataset that has been obtained by extracting and preprocessing its metrics from a C++ software. Method: Our empirical study has taken advantage of CLAMI with its improvement CLAMI+ that we have applied on high energy physics software datasets. Furthermore, we have used clustering techniques such as the K-means algorithm to find potentially critical modules. Results: Our experimental analysis have been carried out on 1 open source project with 34 software releases. We have applied 17 ML techniques to the labelled datasets obtained by following the CLAMI and CLAMI+ approaches. The two approaches have been evaluated by using different performance metrics, our results show that CLAMI+ performs better than CLAMI. The predictive average accuracy metric is around 95% for 4 ML techniques (4 out of 17) that show a Kappa statistic greater than 0.80. We applied K-means on the same dataset and obtained 2 clusters labelled according to the output of CLAMI and CLAMI+. Conclusion: Based on the results of the different statistical tests, we conclude that no significant performance differences have been found in the selected classification techniques.
引用
收藏
页码:333 / 353
页数:21
相关论文
共 50 条
  • [1] Lessons Learned from the Assessment of Software Defect Prediction on WLCG Software A Study with Unlabelled Datasets and Machine Learning Techniques
    Ronchieri, Elisabetta
    Canaparo, Marco
    Belgiovine, Mauro
    Salomoni, Davide
    Martelli, Barbara
    24TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2019), 2020, 245
  • [2] Software Defect Prediction on Unlabelled Dataset with Machine Learning Techniques
    Ronchieri, Elisabetta
    Canaparo, Marco
    Belgiovine, Mauro
    Salomoni, Davide
    2019 IEEE NUCLEAR SCIENCE SYMPOSIUM AND MEDICAL IMAGING CONFERENCE (NSS/MIC), 2019,
  • [3] Feature Selection in Software Defect Prediction: A Comparative Study
    Kakkar, Misha
    Jain, Sarika
    2016 6TH INTERNATIONAL CONFERENCE - CLOUD SYSTEM AND BIG DATA ENGINEERING (CONFLUENCE), 2016, : 658 - 663
  • [4] Improving Software Defect Prediction in Noisy Imbalanced Datasets
    Shi, Haoxiang
    Ai, Jun
    Liu, Jingyu
    Xu, Jiaxi
    APPLIED SCIENCES-BASEL, 2023, 13 (18):
  • [5] An approach to software defect prediction for small-sized datasets
    Bal, Pravas Ranjan
    Shukla, Suyash
    Kumar, Sandeep
    APPLIED INTELLIGENCE, 2025, 55 (06)
  • [6] A Study of Redundant Metrics in Defect Prediction Datasets
    Jiarpakdee, Jirayus
    Tantithamthavorn, Chakkrit
    Ihara, Akinori
    Matsumoto, Kenichi
    2016 IEEE 27TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW), 2016, : 51 - 52
  • [7] A Comparative Study on New Classification Algorithm using NASA MDP Datasets for Software Defect Detection
    Sreedevi, E.
    PremaLatha, V
    Sivakumar, S.
    Nayak, Soumya Ranjan
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT SUSTAINABLE SYSTEMS (ICISS 2019), 2019, : 312 - 317
  • [8] On the Reproducibility of Software Defect Datasets
    Zhu, Hao-Nan
    Rubio-Gonzalez, Cindy
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 2324 - 2335
  • [9] Class Balancing Approaches to Improve for Software Defect Prediction Estimations: A Comparative Study
    Sanchez-Garcia, angel J.
    Limon, Xavier
    Dominguez-Isidro, Saul
    Olvera-Villeda, Dan Javier
    Perez-Arriaga, Juan Carlos
    PROGRAMMING AND COMPUTER SOFTWARE, 2024, 50 (08) : 621 - 647
  • [10] Learning from Software defect datasets
    Singh, Pradeep
    PROCEEDINGS OF 2019 5TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMPUTING AND CONTROL (ISPCC 2K19), 2019, : 58 - 63