Improved prediction of software defects using ensemble machine learning techniques

被引:29
作者
Mehta, Sweta [1 ]
Patnaik, K. Sridhar [1 ]
机构
[1] Birla Inst Technol, Dept Comp Sci & Engn, Ranchi 835315, Bihar, India
关键词
Defect prediction; Dimension reduction; Data imbalance; Machine learning algorithms; XGBoost; Stacking ensemble classifier;
D O I
10.1007/s00521-021-05811-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software testing process is a crucial part in software development. Generally the errors made by developers get fixed at a later stage of the software development process. This increases the impact of the defect. To prevent this, defects need to be predicted during the initial days of the software development, which in turn helps in efficient utilization of the testing resources. Defect prediction process involves classification of software modules into defect prone and non-defect prone. This paper aims to reduce the impact of two major issues faced during defect prediction, i.e., data imbalance and high dimensionality of the defect datasets. In this research work, various software metrics are evaluated using feature selection techniques such as Recursive Feature Elimination (RFE), Correlation-based feature selection, Lasso, Ridge, ElasticNet and Boruta. Logistic Regression, Decision Trees, K-nearest neighbor, Support Vector Machines and Ensemble Learning are some of the algorithms in machine learning that have been used in combination with the feature extraction and feature selection techniques for classifying the modules in software as defect prone and non-defect prone. The proposed model uses combination of Partial Least Square (PLS) Regression and RFE for dimension reduction which is further combined with Synthetic Minority Oversampling Technique due to the imbalanced nature of the used datasets. It has been observed that XGBoost and Stacking Ensemble technique gave best results for all the datasets with defect prediction accuracy more than 0.9 as compared to algorithms used in the research work.
引用
收藏
页码:10551 / 10562
页数:12
相关论文
共 50 条
[41]   Diversity based imbalance learning approach for software fault prediction using machine learning models [J].
Manchala, Pravali ;
Bisi, Manjubala .
APPLIED SOFT COMPUTING, 2022, 124
[42]   Combining feature selection, feature learning and ensemble learning for software fault prediction [J].
Hung Duy Tran ;
Le Thi My Hanh ;
Nguyen Thanh Binh .
PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, :78-85
[43]   Advancing Breast Cancer Prediction using Logistic Regression and Machine Learning Techniques [J].
Bhuria, Ruchika ;
Gill, Kanwarpartap Singh ;
Malhotra, Sonal ;
Singh, Mukesh .
2ND INTERNATIONAL CONFERENCE ON SUSTAINABLE COMPUTING AND SMART SYSTEMS, ICSCSS 2024, 2024, :1374-1377
[44]   Severity Prediction of Highway Crashes in Saudi Arabia Using Machine Learning Techniques [J].
Aldhari, Ibrahim ;
Almoshaogeh, Meshal ;
Jamal, Arshad ;
Alharbi, Fawaz ;
Alinizzi, Majed ;
Haider, Husnain .
APPLIED SCIENCES-BASEL, 2023, 13 (01)
[45]   Prediction of Road Accidents' Severity on Russian Roads Using Machine Learning Techniques [J].
Donchenko, D. ;
Sadovnikova, N. ;
Parygin, D. .
PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING, ICIE 2019, VOL II, 2020, :1493-1501
[46]   House price prediction using hedonic pricing model and machine learning techniques [J].
Zaki, John ;
Nayyar, Anand ;
Dalal, Surjeet ;
Ali, Zainab H. .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (27)
[47]   Stock Price Prediction by using Machine Learning Techniques: a Study of TCS Ltd [J].
Jakhar, Yogesh Kumar ;
Sharma, Pawan ;
Ahmed, Bilal .
2ND INTERNATIONAL CONFERENCE ON SUSTAINABLE COMPUTING AND SMART SYSTEMS, ICSCSS 2024, 2024, :1256-1260
[48]   An Empirical Analysis of Machine Learning Algorithms for Crime Prediction Using Stacked Generalization: An Ensemble Approach [J].
Kshatri, Sapna Singh ;
Singh, Deepak ;
Narain, Bhavana ;
Bhatia, Surbhi ;
Quasim, Mohammad Tabrez ;
Sinha, G. R. .
IEEE ACCESS, 2021, 9 :67488-67500
[49]   Early Software Defects Density Prediction: Training the International Software Benchmarking Cross Projects Data Using Supervised Learning [J].
Tahir, Touseef ;
Gencel, Cigdem ;
Rasool, Ghulam ;
Umer, Tariq ;
Rasheed, Jawad ;
Yeo, Sook Fern ;
Cevik, Taner .
IEEE ACCESS, 2023, 11 :141965-141986
[50]   ERP adoption prediction using machine learning techniques and ERP selection among SMEs [J].
Basu, Aveek ;
Jha, Rohini .
INTERNATIONAL JOURNAL OF BUSINESS PERFORMANCE MANAGEMENT, 2024, 25 (02) :242-270