Improved prediction of software defects using ensemble machine learning techniques

被引:28
作者
Mehta, Sweta [1 ]
Patnaik, K. Sridhar [1 ]
机构
[1] Birla Inst Technol, Dept Comp Sci & Engn, Ranchi 835315, Bihar, India
关键词
Defect prediction; Dimension reduction; Data imbalance; Machine learning algorithms; XGBoost; Stacking ensemble classifier;
D O I
10.1007/s00521-021-05811-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software testing process is a crucial part in software development. Generally the errors made by developers get fixed at a later stage of the software development process. This increases the impact of the defect. To prevent this, defects need to be predicted during the initial days of the software development, which in turn helps in efficient utilization of the testing resources. Defect prediction process involves classification of software modules into defect prone and non-defect prone. This paper aims to reduce the impact of two major issues faced during defect prediction, i.e., data imbalance and high dimensionality of the defect datasets. In this research work, various software metrics are evaluated using feature selection techniques such as Recursive Feature Elimination (RFE), Correlation-based feature selection, Lasso, Ridge, ElasticNet and Boruta. Logistic Regression, Decision Trees, K-nearest neighbor, Support Vector Machines and Ensemble Learning are some of the algorithms in machine learning that have been used in combination with the feature extraction and feature selection techniques for classifying the modules in software as defect prone and non-defect prone. The proposed model uses combination of Partial Least Square (PLS) Regression and RFE for dimension reduction which is further combined with Synthetic Minority Oversampling Technique due to the imbalanced nature of the used datasets. It has been observed that XGBoost and Stacking Ensemble technique gave best results for all the datasets with defect prediction accuracy more than 0.9 as compared to algorithms used in the research work.
引用
收藏
页码:10551 / 10562
页数:12
相关论文
共 50 条
  • [11] Scientific programming using optimized machine learning techniques for software fault prediction to improve software quality
    Shafiq, Muhammad
    Alghamedy, Fatemah H.
    Jamal, Nasir
    Kamal, Tahir
    Daradkeh, Yousef Ibrahim
    Shabaz, Mohammad
    IET SOFTWARE, 2023, 17 (04) : 694 - 704
  • [12] Ensemble Machine Learning Techniques Using Computer Simulation Data for Wild Blueberry Yield Prediction
    Seireg, Hayam R.
    Omar, Yasser M. K.
    Abd El-Samie, Fathi E.
    El-Fishawy, Adel S.
    Elmahalawy, Ahmed
    IEEE ACCESS, 2022, 10 : 64671 - 64687
  • [13] An ensemble learning approach for diabetes prediction using boosting techniques
    Ganie, Shahid Mohammad
    Pramanik, Pijush Kanti Dutta
    Malik, Majid Bashir
    Mallik, Saurav
    Qin, Hong
    FRONTIERS IN GENETICS, 2023, 14
  • [14] Software fault prediction using data mining, machine learning and deep learning techniques: A systematic literature review
    Batool, Iqra
    Khan, Tamim Ahmed
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 100
  • [15] Prediction of Cesarean Childbirth using Ensemble Machine Learning Methods
    Khan, Nafiz Imtiaz
    Mahmud, Tahasin
    Islam, Muhammad Nazrul
    Mustafina, Sumaiya Nuha
    22ND INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES (IIWAS2020), 2020, : 331 - 339
  • [16] Software fault prediction using deep learning techniques
    Batool, Iqra
    Khan, Tamim Ahmed
    SOFTWARE QUALITY JOURNAL, 2023, 31 (04) : 1241 - 1280
  • [17] Software Defect Prediction Using Ensemble Learning: A Systematic Literature Review
    Matloob, Faseeha
    Ghazal, Taher M.
    Taleb, Nasser
    Aftab, Shabib
    Ahmad, Munir
    Khan, Muhammad Adnan
    Abbas, Sagheer
    Soomro, Tariq Rahim
    IEEE ACCESS, 2021, 9 : 98754 - 98771
  • [18] Intelligent Sales Prediction Using Machine Learning Techniques
    Cheriyan, Sunitha
    Ibrahim, Shaniba
    Mohanan, Saju
    Treesa, Susan
    2018 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRONICS & COMMUNICATIONS ENGINEERING (ICCECE), 2018, : 53 - 58
  • [19] Personal bankruptcy prediction using machine learning techniques
    Brygala, Magdalena
    Korol, Tomasz
    ECONOMICS AND BUSINESS REVIEW, 2024, 10 (02) : 118 - 142
  • [20] An empirical study of ensemble techniques for software fault prediction
    Rathore, Santosh S.
    Kumar, Sandeep
    APPLIED INTELLIGENCE, 2021, 51 (06) : 3615 - 3644