Automated classification of software issue reports using machine learning techniques: an empirical study

被引:51
作者
Pandey N. [1 ]
Sanyal D.K. [1 ,2 ]
Hudait A. [1 ]
Sen A. [3 ]
机构
[1] School of Computer Engineering, KIIT University, Bhubaneswar, 751024, Odisha
[2] Indian Institute of Technology Kharagpur, Kharagpur, 721302, West Bengal
[3] Department of Computer Science and Engineering, JIS University, Kolkata, 700109, West Bengal
关键词
Accuracy; Bug classification; F-measure; Machine learning; Random forest;
D O I
10.1007/s11334-017-0294-1
中图分类号
学科分类号
摘要
Software developers, testers and customers routinely submit issue reports to software issue trackers to record the problems they face in using a software. The issues are then directed to appropriate experts for analysis and fixing. However, submitters often misclassify an improvement request as a bug and vice versa. This costs valuable developer time. Hence automated classification of the submitted reports would be of great practical utility. In this paper, we analyze how machine learning techniques may be used to perform this task. We apply different classification algorithms, namely naive Bayes, linear discriminant analysis, k-nearest neighbors, support vector machine (SVM) with various kernels, decision tree and random forest separately to classify the reports from three open-source projects. We evaluate their performance in terms of F-measure, average accuracy and weighted average F-measure. Our experiments show that random forests perform best, while SVM with certain kernels also achieve high performance. © 2017, Springer-Verlag London Ltd.
引用
收藏
页码:279 / 297
页数:18
相关论文
共 42 条
[1]  
Aggarwal K., Timbers F., Rutgers T., Hindle A., Stroulia E., Greiner R., Detecting duplicate bug reports with software engineering domain knowledge, J Softw Evol Process, (2017)
[2]  
Antoniol G., Ayari K., Dipenta M., Khomh F., Gueheneuc Y.G., Is it a bug or an enhancement? A text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds (CASCON’08), ACM, pp 23:304–23:318, (2008)
[3]  
Anvik J., Murphy G.C., Reducing the effort of bug report triage: recommenders for development-oriented decisions, ACM Trans Softw Eng Methodol, 20, 3, (2011)
[4]  
Anvik J., Hiew L., Murphy G.C., Who should fix this bug?, Proceedings of the 28th international conference on software engineering (ICSE’06, pp. 361-370, (2006)
[5]  
Bhattacharya P., Neamtiu I., Shelton C.R., Automated, highly-accurate, bug assignment using machine learning and tossing graphs, J Syst Softw, 85, 10, pp. 2275-2292, (2012)
[6]  
Breiman L., Random forests, Mach Learn, 45, 1, pp. 5-32, (2001)
[7]  
Cavalcanti Y.C., Mota Silveira Neto P.A., Machado I.C., Vale T.F., Almeida E.S., Meira S.R.L., Challenges and opportunities for software change request repositories: a systematic mapping study, J Softw Evol Process, 26, 7, pp. 620-653, (2014)
[8]  
Chawla I., Singh S.K., Automatic bug labeling using semantic information from lSI. In: Proceedings of the 2014 7th international conference on contemporary computing (IC3’14), IEEE, pp. 376-381, (2014)
[9]  
Chawla I., Singh S.K., An automated approach for bug categorization using fuzzy logic, Proceedings of the 8th India software engineering conference (ISEC’15, pp. 90-99, (2015)
[10]  
Chen T.H., Thomas S.W., Hassan A.E., A survey on the use of topic models when mining software repositories, Empir Softw Eng, 21, 5, pp. 1843-1919, (2016)