An adaptive synthetic sampling and batch generation-oriented hybrid approach for addressing class imbalance problem in software defect prediction

被引:0
作者
Taskeen, Anam [1 ]
Khan, Saif Ur Rehman [2 ]
Mashkoor, Atif [3 ]
机构
[1] Department of Computer Science, COMSATS University Islamabad (CUI), Islamabad
[2] Department of Computing, Shifa Tameer-e-Millat University (STMU), Islamabad
[3] Institute of Software Systems Engineering, Johannes Kepler University, Altenbergerstraße 69, Linz
关键词
ADASYN; Batch generator; Class imbalance; Machine learning; Sampling; Software defect prediction;
D O I
10.1007/s00500-024-10378-x
中图分类号
学科分类号
摘要
Learning classifiers with uneven class distribution datasets poses a significant challenge in software defect prediction. This problem arises when the number of samples representing one class is significantly smaller than the others, leading to weak classification performance, particularly for minority class instances. Traditional classification models assuming equal class instances can result in low prediction accuracy and decision-making precision for minority class instances, raising concerns about identifying such instances accurately. To overcome this issue, this research proposes a hybrid technique that combines the Adaptive Synthetic Sampling (ADASYN) approach with a batch generator named the HADAB technique. ADASYN generates synthetic samples for the minority class, balancing the dataset and improving prediction accuracy. Conversely, the batch generator feeds data to the model in batches, enhancing training efficiency. The Multi-Layer Perceptron (MLP) serves as the base classifier in this study. The proposed HADAB technique significantly improves prediction accuracy and training efficiency without requiring additional parameter tuning, algorithm modification, or increasing complexity. We validate the performance of HADAB using publicly available NASA datasets encompassing diverse types. The results demonstrate the superiority of HADAB over traditional prediction accuracy methods. In conclusion, the proposed HADAB technique offers a practical and effective solution for handling class imbalance in software defect prediction, leading to improved prediction accuracy. © The Author(s) 2024.
引用
收藏
页码:13595 / 13614
页数:19
相关论文
共 39 条
  • [1] Ahmed S., Mahbub A., Rayhan F., Jani R., Shatabda S., Farid D.M., Hybrid methods for class imbalance learning employing bagging with sampling techniques, In: 2017 2Nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), Bengaluru, India, pp. 1-5, (2017)
  • [2] Bennin K.E., Keung J., Phannachitta P., Monden A., Mensah S., MAHAKIL: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction, IEEE Trans Softw Eng, 44, 6, pp. 534-550, (2018)
  • [3] Blaszczynski J., Deckert M., Stefanowski J., Wilk S., Integrating selective pre-processing of imbalanced data with ivotes ensemble, Rough Sets and Current Trends in Computing, 6086, (2010)
  • [4] Chamseddine E., Mansouri N., Soui M., Abed M., Handling class imbalance in COVID-19 chest X-ray images classification: using SMOTE and weighted loss, Appl Soft Comput, 129, (2022)
  • [5] Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P., SMOTE: synthetic minority over-sampling technique, J Artif Int, 16, pp. 321-357, (2002)
  • [6] Chawla N.V., Lazarevic A., Hall L.O., Bowyer K.W., SMOTEBoost: improving prediction of the minority class in boosting, Knowl Discov Databases, 2838, pp. 107-119, (2003)
  • [7] Chen L., Fang B., Shang Z., Et al., Tackling class overlap and imbalance problems in software defect prediction, Softw Qual J, 26, pp. 97-125, (2018)
  • [8] Estabrooks A., Et al., A multiple resampling method for learning from imbalanced data sets, Comput Intell, 20, pp. 18-36, (2004)
  • [9] Galar M., Fernandez A., Barrenechea E., Bustince H., Herrera F., A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, IEEE Trans Syst Man Cybern, 42, 4, pp. 463-484, (2012)
  • [10] Gong L., Jiang S., Bo L., Jiang L., Qian J., A novel class-imbalance learning approach for both within-project and cross-project defect prediction, IEEE Trans Reliab, 69, 1, pp. 40-54, (2020)