A comparative study of software defect binomial classification prediction models based on machine learning

被引：3

作者：

Tao, Hongwei ^{[1
]}

Niu, Xiaoxu ^{[1
]}

Xu, Lang ^{[1
]}

Fu, Lianyou ^{[1
]}

Cao, Qiaoling ^{[1
]}

Chen, Haoran ^{[1
]}

Shang, Songtao ^{[1
]}

Xian, Yang ^{[1
]}

机构：

[1] Zhengzhou Univ Light Ind, Coll Comp Sci & Technol, Zhengzhou 450002, Peoples R China

来源：

SOFTWARE QUALITY JOURNAL | 2024年 / 32卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Software defect prediction; Machine learning; Class imbalance; Data sampling;

D O I：

10.1007/s11219-024-09683-3

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

As information technology continues to advance, software applications are becoming increasingly critical. However, the growing size and complexity of software development can lead to serious flaws resulting in significant financial losses. To address this issue, Software Defect Prediction (SDP) technology is being developed to detect and resolve defects early in the software development process, ensuring high software quality. As a result, SDP research has become a major focus for academics worldwide. This study aims to compare various machine learning-based SDP algorithm models and determine if traditional machine learning algorithms affect SDP outcomes. Unlike previous studies that aimed to identify the best prediction model for all datasets, this paper constructs SDP superiority models separately for different datasets. Using the publicly available ESEM2016 dataset, 13 machine learning classification algorithms are employed to predict software defects. Evaluation indicators such as Accuracy, AUC(Area Under the Curve), F-measure, and Running Time(RT) are utilized to assess the performance of the classification algorithms. Due to the serious class imbalance problem in this dataset, 10 sampling methods are combined with the 13 machine learning algorithms to explore the effect of sampling techniques on the performance of traditional machine learning classification models. Finally, a comprehensive evaluation is conducted to identify the best combination of sampling techniques and classification models to construct the final dominant model for SDP.

引用

页码：1203 / 1237

页数：35

共 55 条

[1] Discriminating features-based cost-sensitive approach for software defect prediction [J].

Ali, Aftab ;

Khan, Naveed ;

Abu-Tair, Mamun ;

Noppen, Joost ;

McClean, Sally ;

McChesney, Ian .

AUTOMATED SOFTWARE ENGINEERING, 2021, 28 (02)

[2] A replicated quantitative analysis of fault distributions in complex software systems [J].

Andersson, Carina ;

Runeson, Per .

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2007, 33 (05) :273-286

[3]

Bakir B., 2008, Int. J. Ind. Manuf. Eng., V2, P1334

[4] Software fault prediction using deep learning techniques [J].

Batool, Iqra ;

Khan, Tamim Ahmed .

SOFTWARE QUALITY JOURNAL, 2023, 31 (04) :1241-1280

[5] On the relative value of data resampling approaches for software defect prediction [J].

Bennin, Kwabena Ebo ;

Keung, Jacky W. ;

Monden, Akito .

EMPIRICAL SOFTWARE ENGINEERING, 2019, 24 (02) :602-636

[6]

Bhargava N., 2013, INT J ADV RES COMPUT, V3, P1114

[7] An empirical evaluation of defect prediction approaches in within-project and cross-project context [J].

Bhat, Nayeem Ahmad ;

Farooq, Sheikh Umar .

SOFTWARE QUALITY JOURNAL, 2023, 31 (03) :917-946

[8] SMOTE: Synthetic minority over-sampling technique [J].

Chawla, Nitesh V. ;

Bowyer, Kevin W. ;

Hall, Lawrence O. ;

Kegelmeyer, W. Philip .

2002, American Association for Artificial Intelligence (16)

[9] XGBoost: A Scalable Tree Boosting System [J].

Chen, Tianqi ;

Guestrin, Carlos .

KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794

[10]

Chug A, 2013, CONFLUENCE 2013 NEXT, P173, DOI DOI 10.1049/CP.2013.2313

← 1 2 3 4 5 6 →