Software Defect Prediction Model Based on the Combination of Machine Learning Algorithms

被引:0
作者
Fu Y. [1 ]
Dong W. [1 ]
Yin L. [1 ]
Du Y. [1 ]
机构
[1] College of Computer, National University of Defense Technology, Changsha
来源
Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2017年 / 54卷 / 03期
基金
中国国家自然科学基金;
关键词
Combination; Eclipse prediction dataset; Ensemble learning; Machine learning; Software defect prediction;
D O I
10.7544/issn1000-1239.2017.20151052
中图分类号
学科分类号
摘要
According to the metrics information and defects found in a software product, we can use software defect prediction technology to predict more defects that may also exist as early as possible, then testing and validation resources are allocated based on the prediction result appropriately. Defect prediction based on machine learning techniques can find software defects comprehensively and automatically, and it is becoming one of the main methods of current defect prediction technologies. In order to improve the efficiency and accuracy of prediction, selection and research of machine learning algorithms is the critical part. In this paper, we do comparative analysis to different machine learning defect prediction methods, and find that different algorithms have both advantages and disadvantages in different evaluation indexes. Taking these advantages, we refer to the stacking integration learning method and present a combined software defect prediction model. In this model, we first predict once, then add the prediction results of different methods in the original dataset as new software metrics, and then predict again. Finally, we make experiments on Eclipse dataset. Experimental results show that this model is technical feasibility, and can decrease the cost of time and improve the accuracy. © 2017, Science Press. All right reserved.
引用
收藏
页码:633 / 641
页数:8
相关论文
共 15 条
[1]  
Arisholm E., Briand L.C., Johannessen E.B., A systematic and comprehensive investigation of methods to build and evaluate fault prediction models, Journal of Systems and Software, 83, 1, pp. 2-17, (2010)
[2]  
Khoshgoftaar T.M., Seliya N., Comparative assessment of software quality classification techniques: An empirical case study, Empirical Software Engineering, 9, 3, pp. 229-257, (2004)
[3]  
Fenton N., Krause P., Neil M., Et al., A probabilistic model for software defect prediction, IEEE Trans on Software Engineering, 44, 21, pp. 444-453, (2001)
[4]  
Vandecruys O., Martens D., Baesens B., Et al., Mining software repositories for comprehensible software fault prediction models, Journal of Systems and Software, 81, 5, pp. 823-839, (2008)
[5]  
Turhan B., Bener A., Analysis of Naive Bayes' assumptions on software fault data: An empirical study, Data & Knowledge Engineering, 68, 2, pp. 278-290, (2009)
[6]  
Chang C., Research about software defect priority prediction model based on AdaBoost-SVM algorithm, (2013)
[7]  
Li X., Zhu Q., Prediction of improved BP neural network by Adaboost algorithm, Computer Engineering & Science, 35, 8, pp. 96-102, (2013)
[8]  
Hall T., Beecham S., Bowes D., Et al., A systematic literature review on fault prediction performance in software engineering, IEEE Trans on Software Engineering, 38, 6, pp. 1276-1304, (2012)
[9]  
Zhao H., Using effort-aware performance indicators to compare the ability of code metrics, process metrics, and historical fault metrics to predict fault-proneness, (2013)
[10]  
Wang P., Research on software defect prediction based on feature selection, (2013)