Improved prediction of software defects using ensemble machine learning techniques

被引:28
作者
Mehta, Sweta [1 ]
Patnaik, K. Sridhar [1 ]
机构
[1] Birla Inst Technol, Dept Comp Sci & Engn, Ranchi 835315, Bihar, India
关键词
Defect prediction; Dimension reduction; Data imbalance; Machine learning algorithms; XGBoost; Stacking ensemble classifier;
D O I
10.1007/s00521-021-05811-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software testing process is a crucial part in software development. Generally the errors made by developers get fixed at a later stage of the software development process. This increases the impact of the defect. To prevent this, defects need to be predicted during the initial days of the software development, which in turn helps in efficient utilization of the testing resources. Defect prediction process involves classification of software modules into defect prone and non-defect prone. This paper aims to reduce the impact of two major issues faced during defect prediction, i.e., data imbalance and high dimensionality of the defect datasets. In this research work, various software metrics are evaluated using feature selection techniques such as Recursive Feature Elimination (RFE), Correlation-based feature selection, Lasso, Ridge, ElasticNet and Boruta. Logistic Regression, Decision Trees, K-nearest neighbor, Support Vector Machines and Ensemble Learning are some of the algorithms in machine learning that have been used in combination with the feature extraction and feature selection techniques for classifying the modules in software as defect prone and non-defect prone. The proposed model uses combination of Partial Least Square (PLS) Regression and RFE for dimension reduction which is further combined with Synthetic Minority Oversampling Technique due to the imbalanced nature of the used datasets. It has been observed that XGBoost and Stacking Ensemble technique gave best results for all the datasets with defect prediction accuracy more than 0.9 as compared to algorithms used in the research work.
引用
收藏
页码:10551 / 10562
页数:12
相关论文
共 50 条
  • [21] Software Defect Prediction Analysis Using Machine Learning Algorithms
    Singh, Praman Deep
    Chug, Anuradha
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE AND ENGINEERING (CONFLUENCE 2017), 2017, : 775 - 781
  • [22] Injury severity prediction of traffic crashes with ensemble machine learning techniques: a comparative study
    Jamal, Arshad
    Zahid, Muhammad
    Tauhidur Rahman, Muhammad
    Al-Ahmadi, Hassan M.
    Almoshaogeh, Meshal
    Farooq, Danish
    Ahmad, Mahmood
    INTERNATIONAL JOURNAL OF INJURY CONTROL AND SAFETY PROMOTION, 2021, 28 (04) : 408 - 427
  • [23] Prediction of Water Level Using Machine Learning and Deep Learning Techniques
    Ishan Ayus
    Narayanan Natarajan
    Deepak Gupta
    Iranian Journal of Science and Technology, Transactions of Civil Engineering, 2023, 47 : 2437 - 2447
  • [24] Prediction of Water Level Using Machine Learning and Deep Learning Techniques
    Ayus, Ishan
    Natarajan, Narayanan
    Gupta, Deepak
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY-TRANSACTIONS OF CIVIL ENGINEERING, 2023, 47 (04) : 2437 - 2447
  • [25] Liver Disease Prediction and Classification using Machine Learning Techniques
    Tokala, Srilatha
    Hajarathaiah, Koduru
    Gunda, Sai Ram Praneeth
    Botla, Srinivasrao
    Nalluri, Lakshmikanth
    Nagamanohar, Pathipati
    Anamalamudi, Satish
    Enduri, Murali Krishna
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (02) : 871 - 878
  • [26] Validating Unsupervised Machine Learning Techniques for Software Defect Prediction With Generic Metamorphic Testing
    Chan, Pak Yuen Patrick
    Keung, Jacky
    IEEE ACCESS, 2024, 12 : 165155 - 165172
  • [27] Analysis of Tree-Family Machine Learning Techniques for Risk Prediction in Software Requirements
    Khan, Bilal
    Naseem, Rashid
    Alam, Iftikhar
    Khan, Inayat
    Alasmary, Hisham
    Rahman, Taj
    IEEE ACCESS, 2022, 10 : 98220 - 98231
  • [28] Harnessing ensemble Machine learning models for improved salinity prediction in large river basin scales
    Mahmoud, Mohamed F.
    Arabi, Mazdak
    Pallickara, Shrideep
    JOURNAL OF HYDROLOGY, 2025, 652
  • [29] Prediction of Breast Cancer using Traditional and Ensemble Technique: A Machine Learning Approach
    Islam, Tamanna
    Akhi, Amatul Bushra
    Akter, Farzana
    Hasan, Md. Najmul
    Lata, Munira Akter
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 867 - 875
  • [30] Development of optimised software fault prediction model using machine learning
    Juneja, Shallu
    Bhathal, Gurjit Singh
    Sidhu, Brahmaleen K.
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2024, 18 (02): : 1355 - 1376