An unsupervised software bug count prediction model based on selected software metrics

被引:0
作者
Kumar, Rakesh [1 ]
Chaturvedi, Amrita [1 ]
机构
[1] Indian Inst Technol BHU, Dept Comp Sci & Engn, Varanasi 221005, Uttar Pradesh, India
关键词
Machine learning; Regression model; Software bug count vector prediction; Software metrics selection; Software metrics threshold; Unsupervised learning; DATA MINING TECHNIQUES; DEFECT PREDICTION; NUMBER; FAULTS; CLASSIFIERS; THRESHOLDS; REGRESSION; MACHINE; SYSTEM; SIZE;
D O I
10.1007/s10489-025-06557-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software Bug Count Vector (SBCV) prediction technique is a regression model that aims to predict the precise number of bugs in each module of a software system. In contrast, a Software Bug Prediction (SBP) model focuses on predicting whether a module is buggy or not. Predicting the exact number of bugs in each module brings efficiency in software test resource allocation, maintenance, and release time. Many researchers have conducted empirical studies to predict SBCV using regression algorithms on labeled datasets only. However, accurately collecting and labeling buggy data poses multiple challenges, such as maintaining historical information, a version control system, an issue tracking system, and the need for experienced software experts. To address the limitation of labeled datasets and the absence of an unsupervised SBCV prediction model, we propose a novel unsupervised SBCV prediction model based on software metrics (SMs) thresholds. We have analyzed that previously proposed unsupervised SBP models calculated independent software metrics threshold using different techniques, but they did not consider skewness behaviors of software metrics distribution. To overcome this, we reduce the skewness of the SMs distribution using log transformation and then derive the threshold of each selected SM. Based on these thresholds and robust linear regression algorithm, we propose a SBCV prediction model. The proposed SBCV prediction model does not require labeled datasets and predicts software bug count in each module by self-learning approach. The average performance of our proposed technique over 22 datasets surpasses the majority of state-of-the-art regression techniques (8 standard supervised algorithms) in terms of mean absolute error (0.45), mean relative error (0.20), and Pred(l)_Error (0.29). The results of statistical tests, including the Wilcoxon signed-rank test, CohenD test, and Nemenyi test, demonstrate the significance of SBCV prediction models.
引用
收藏
页数:27
相关论文
共 93 条
[1]   Semantic and traditional feature fusion for software defect prediction using hybrid deep learning model [J].
Abdu, Ahmed ;
Zhai, Zhengjun ;
Abdo, Hakim A. ;
Algabri, Redhwan ;
Al-masni, Mohammed A. ;
Muhammad, Mannan Saeed ;
Gu, Yeong Hyeon .
SCIENTIFIC REPORTS, 2024, 14 (01)
[2]  
Afkal W, 2008, INMIC: 2008 INTERNATIONAL MULTITOPIC CONFERENCE, P349, DOI 10.1109/INMIC.2008.4777762
[3]   Deriving thresholds of software metrics to predict faults on open source software: Replicated case studies [J].
Arar, Omer Faruk ;
Ayan, Kursat .
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 61 :106-121
[4]   A new binary chaos-based metaheuristic algorithm for software defect prediction [J].
Arasteh, Bahman ;
Arasteh, Keyvan ;
Ghaffari, Ali ;
Ghanbarzadeh, Reza .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (07) :10093-10123
[5]  
Azeem N., 2011, Journal of Software Engineering and Applications, V4, P639, DOI DOI 10.4236/JSEA.2011.411075
[6]   WR-ELM: Weighted Regularization Extreme Learning Machine for Imbalance Learning in Software Fault Prediction [J].
Bal, Pravas Ranjan ;
Kumar, Sandeep .
IEEE TRANSACTIONS ON RELIABILITY, 2020, 69 (04) :1355-1375
[7]   Quantitative Evaluation of Software Quality Metrics in Open-Source Projects [J].
Barkmann, Henrike ;
Lincke, Rudiger ;
Lowe, Welf .
2009 INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS: WAINA, VOLS 1 AND 2, 2009, :1067-1072
[8]   Software metrics thresholds calculation techniques to predict fault-proneness: An empirical comparison [J].
Boucher, Alexandre ;
Badri, Mourad .
INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 96 :38-67
[9]   Software defect prediction: do different classifiers find the same defects? [J].
Bowes, David ;
Hall, Tracy ;
Petric, Jean .
SOFTWARE QUALITY JOURNAL, 2018, 26 (02) :525-552
[10]   Data Mining Techniques for Software Quality Prediction in Open Source Software: An Initial Assessment [J].
Canaparo, Marco ;
Ronchieri, Elisabetta .
23RD INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2018), 2019, 214