Feature Clustering and Ensemble Learning Based Approach for Software Defect Prediction

被引:0
作者
Srivastava R. [1 ]
Jain A.K. [1 ]
机构
[1] Department of Applied Mathematics, Delhi Technological University, Delhi
来源
Recent Advances in Computer Science and Communications | 2022年 / 15卷 / 06期
关键词
class imbalance; confidence interval; ensemble modelling; feature selection; hard voting; Software defects;
D O I
10.2174/2666255813999201109201259
中图分类号
学科分类号
摘要
Objective: Defects in delivered software products not only have financial implications but also affect the reputation of the organisation and lead to wastage of time and human re-sources. This paper aims to detect defects in software modules. Methods: Our approach sequentially combines SMOTE algorithm with K-means clustering algorithm to deal with class imbalance problem to obtain a set of key features based on the inter-class and intra-class coefficient of correlation and ensemble modeling to predict defects in software modules. After cautious examination, an ensemble framework of XGBoost, Decision Tree, and Random Forest is used for the prediction of software defects owing to numerous merits of the ensembling approach. Results: We have used five open-source datasets from NASA PROMISE repository for software engineering. The result obtained from our approach has been compared with that of individual algorithms used in the ensemble. A confidence interval for the accuracy of our approach with re-spect to performance evaluation metrics, namely accuracy, precision, recall, F1 score and AUC score, has also been constructed at a significance level of 0.01. Conclusion: Results have been depicted pictographically. © 2022 Bentham Science Publishers.
引用
收藏
页码:868 / 882
页数:14
相关论文
共 37 条
[1]  
Schneidewind N. F., Hoffmann H. M., Software root cause prediction using clustering techniques: A review, 2015 Global Conference on Communication Technologies, pp. 511-515, (2015)
[2]  
Schneidewind N. F., Hoffmann H. M., An experiment in software error data collection and analysis, IEEE Trans. Softw. Eng, SE-5, 3, pp. 276-286, (1979)
[3]  
Potier D., Albin J., Ferreol R., Bilodeau A., Experiments with computer software complexity and reliability, In Proceedings of the 6th international conference on Software engineering, pp. 94-103, (1991)
[4]  
Nakajo T., Kume H., A case history analysis of software error cause-effect relationships, IEEE Trans. Softw. Eng, 8, pp. 830-838, (1991)
[5]  
Japkowicz N., Stephen S., The class imbalance problem: A systematic study, Intell. Data Anal, 6, 5, pp. 429-449, (2002)
[6]  
Japkowicz N., The class imbalance problem: Significance and strategies, Proceedings of the 2000 International Conference on Artificial Intelligence, 56, pp. 111-117, (2000)
[7]  
Longadge R., Dongre S., Class imbalance problem in data mining review, (2013)
[8]  
Trunk G. V., A problem of dimensionality: A simple example, IEEE Trans. Pattern Anal. Mach. Intell, 1, 3, pp. 306-307, (1979)
[9]  
Tangherlini F. R., Schwarzschild field inn dimensions and the dimensionality of space problem, Il Nuovo Cimento, 27, 3, pp. 636-651, (1963)
[10]  
Liu Y., Chawla N. V., Harper M. P., Shriberg E., Stolcke A., A study in machine learning from imbalanced data for sentence boundary detection in speech, Comput. Speech Lang, 20, 4, pp. 468-494, (2006)