Handling class overlap and imbalance using overlap driven under-sampling with balanced random forest in software defect prediction

被引:1
作者
Dar, Abdul Waheed [1 ]
Farooq, Sheikh Umar [1 ]
机构
[1] Univ Kashmir, Dept Comp Sci, North Campus, Srinagar, India
关键词
Class imbalance problem; Machine learning; Software defect prediction; Over-sampling; Under-sampling; PERFORMANCE; MACHINE; SMOTE;
D O I
10.1007/s11334-024-00571-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Various techniques in machine learning have been used for building software defect prediction (SDP) models to identify the defective software modules. However, a major challenge to SDP models is the class overlapping and the class imbalance problem of SDP datasets. This study proposes a new SDP model that combines the overlap-based under-sampling framework with the balanced random forest classifier to improve the identification of defective software modules. First, the duplicate instances of the dataset are removed to avoid the over-fitting of the model. Next, the overlapped majority non-defective class instances of the training data are removed by applying an overlap-based under-sampling technique to maximize the presence of minority defective class instances in a region where the two classes overlap. Finally, we use the balanced random forest, which combines the random under-sampling and the ensemble learning techniques on the pre-processed training data for achieving the goal of classification prediction. The efficacy of our proposed SDP model is assessed by comparing its performance against nine state-of-the-art SDP models using 15 imbalanced software defect datasets. Experimental results and the statistical analysis indicate that our proposed SDP model has better predictive performance than other test models in terms of recall, G-mean, F-measure and AUC.
引用
收藏
页码:747 / 767
页数:21
相关论文
共 80 条
  • [1] [Anonymous], 2003, ICML 2003 WORKSH LEA
  • [2] Diversity based multi-cluster over sampling approach to alleviate the class imbalance problem in software defect prediction
    Arun, C.
    Lakshmi, C.
    [J]. INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2023,
  • [3] Prediction of software fault-prone classes using ensemble random forest with adaptive synthetic sampling algorithm
    Balaram, A.
    Vasundra, S.
    [J]. AUTOMATED SOFTWARE ENGINEERING, 2022, 29 (01)
  • [4] MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Yao, Xin
    Murase, Kazuyuki
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 405 - 425
  • [5] SMOTEFRIS-INFFC: Handling the challenge of borderline and noisy examples in imbalanced learning for software defect prediction
    Bashir, Kamal
    Li, Tianrui
    Yohannese, Chubato Wondaferaw
    Yahaya, Mahama
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (01) : 917 - 933
  • [6] Bekkar M, 2013, J INF ENG APPL, V3
  • [7] MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction
    Benni, Kwabena Ebo
    Keung, Jacky
    Phannachitta, Passakorn
    Monden, Akito
    Mensah, Solomon
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (06) : 534 - 550
  • [8] An empirical study on the effectiveness of data resampling approaches for cross-project software defect prediction
    Bennin, Kwabena Ebo
    Tahir, Amjed
    MacDonell, Stephen G.
    Borstler, Jurgen
    [J]. IET SOFTWARE, 2022, 16 (02) : 185 - 199
  • [9] The Significant Effects of Data Sampling Approaches on Software Defect Prioritization and Classification
    Bennin, Kwabena Ebo
    Keung, Jacky
    Monden, Akito
    Phannachitta, Passakorn
    Mensah, Solomon
    [J]. 11TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT (ESEM 2017), 2017, : 364 - 373
  • [10] Alleviating Class Imbalance Issue in Software Fault Prediction Using DBSCAN-Based Induced Graph Under-Sampling Method
    Bhandari, Kirti
    Kumar, Kuldeep
    Sangal, Amrit Lal
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024, 49 (09) : 12589 - 12627