Improved prediction of software defects using ensemble machine learning techniques

被引:29
作者
Mehta, Sweta [1 ]
Patnaik, K. Sridhar [1 ]
机构
[1] Birla Inst Technol, Dept Comp Sci & Engn, Ranchi 835315, Bihar, India
关键词
Defect prediction; Dimension reduction; Data imbalance; Machine learning algorithms; XGBoost; Stacking ensemble classifier;
D O I
10.1007/s00521-021-05811-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software testing process is a crucial part in software development. Generally the errors made by developers get fixed at a later stage of the software development process. This increases the impact of the defect. To prevent this, defects need to be predicted during the initial days of the software development, which in turn helps in efficient utilization of the testing resources. Defect prediction process involves classification of software modules into defect prone and non-defect prone. This paper aims to reduce the impact of two major issues faced during defect prediction, i.e., data imbalance and high dimensionality of the defect datasets. In this research work, various software metrics are evaluated using feature selection techniques such as Recursive Feature Elimination (RFE), Correlation-based feature selection, Lasso, Ridge, ElasticNet and Boruta. Logistic Regression, Decision Trees, K-nearest neighbor, Support Vector Machines and Ensemble Learning are some of the algorithms in machine learning that have been used in combination with the feature extraction and feature selection techniques for classifying the modules in software as defect prone and non-defect prone. The proposed model uses combination of Partial Least Square (PLS) Regression and RFE for dimension reduction which is further combined with Synthetic Minority Oversampling Technique due to the imbalanced nature of the used datasets. It has been observed that XGBoost and Stacking Ensemble technique gave best results for all the datasets with defect prediction accuracy more than 0.9 as compared to algorithms used in the research work.
引用
收藏
页码:10551 / 10562
页数:12
相关论文
共 50 条
  • [31] Splicing sites prediction of human genome using machine learning techniques
    Ullah, Waseem
    Muhammad, Khan
    Ul Haq, Ijaz
    Ullah, Amin
    Ullah Khattak, Saeed
    Sajjad, Muhammad
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (20) : 30439 - 30460
  • [32] Prediction of Student's Educational Performance Using Machine Learning Techniques
    Rao, B. Mallikarjun
    Murthy, B. V. Ramana
    DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT-2K19, 2020, 1079 : 429 - 440
  • [33] Software measurement data reduction using ensemble techniques
    Wang, Huanjing
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    NEUROCOMPUTING, 2012, 92 : 124 - 132
  • [34] Efficient prediction of coronary artery disease using machine learning algorithms with feature selection techniques
    Hassan, Md. Mehedi
    Zaman, Sadika
    Rahman, Md. Mushfiqur
    Bairagi, Anupam Kumar
    El-Shafai, Walid
    Rathore, Rajkumar Singh
    Gupta, Deepak
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 115
  • [35] SDP-ML: An Automated Approach of Software Defect Prediction employing Machine Learning Techniques
    Uddin, Md Nasir
    Li, Bixin
    Mondol, Md Naim
    Rahman, Md Mostafizur
    Mia, Md Suman
    Mondol, Elizabeth Lisa
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND INFORMATION TECHNOLOGY 2021 (ICECIT 2021), 2021,
  • [36] Lung cancer prediction model using ensemble learning techniques and a systematic review analysis
    Mamun, Muntasir
    Farjana, Afia
    Al Mamun, Miraz
    Ahammed, Md Salim
    2022 IEEE WORLD AI IOT CONGRESS (AIIOT), 2022, : 187 - 193
  • [37] Comprehensive Building Fire Risk Prediction Using Machine Learning and Stacking Ensemble Methods
    Ahn, Seungil
    Won, Jinsub
    Lee, Jangchoon
    Choi, Changhyun
    FIRE-SWITZERLAND, 2024, 7 (10):
  • [38] A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning
    Mehmood, Iqra
    Shahid, Sidra
    Hussain, Hameed
    Khan, Inayat
    Ahmad, Shafiq
    Rahman, Shahid
    Ullah, Najeeb
    Huda, Shamsul
    IEEE ACCESS, 2023, 11 : 63579 - 63597
  • [39] Survival Prediction of Heart Failure Patients using Stacked Ensemble Machine Learning Algorithm
    Zaman, S. M. Mehedi
    Qureshi, Wasay Mahmood
    Raihan, Md Mohsin Sarker
    Bin Shams, Abdullah
    Sultana, Sharmin
    2021 IEEE INTERNATIONAL WOMEN IN ENGINEERING (WIE) CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE), 2022, : 117 - 120
  • [40] Enhancing Water Level Prediction Using Ensemble Machine Learning Models: A Comparative Analysis
    Alsulamy, Saleh
    Kumar, Vijendra
    Kisi, Ozgur
    Kedam, Naresh
    Rathnayake, Namal
    WATER RESOURCES MANAGEMENT, 2025,