Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts

被引:96
作者
Jonsson, Leif [1 ,3 ]
Borg, Markus [4 ]
Broman, David [5 ,6 ]
Sandahl, Kristian [3 ]
Eldh, Sigrid [2 ]
Runeson, Per [4 ]
机构
[1] Ericsson AB, Div Res, Torshamnsgatan 35 Kista, Stockholm, Sweden
[2] Ericsson AB, Torshamnsgatan 35 Kista, Stockholm, Sweden
[3] Linkoping Univ, Dept Comp & Informat Sci, SE-58183 Linkoping, Sweden
[4] Lund Univ, Dept Comp Sci, Box 118, S-22100 Lund, Sweden
[5] KTH Royal Inst Technol, S-16440 Kista, Sweden
[6] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
Machine learning; Ensemble learning; Classification; Bug reports; Bug assignment; Industrial scale; Large scale; SOFTWARE; CONFIGURATION; ACCURATE; MODEL;
D O I
10.1007/s10664-015-9401-9
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Bug report assignment is an important part of software maintenance. In particular, incorrect assignments of bug reports to development teams can be very expensive in large software development projects. Several studies propose automating bug assignment techniques using machine learning in open source software contexts, but no study exists for large-scale proprietary projects in industry. The goal of this study is to evaluate automated bug assignment techniques that are based on machine learning classification. In particular, we study the state-of-the-art ensemble learner Stacked Generalization (SG) that combines several classifiers. We collect more than 50,000 bug reports from five development projects from two companies in different domains. We implement automated bug assignment and evaluate the performance in a set of controlled experiments. We show that SG scales to large scale industrial application and that it outperforms the use of individual classifiers for bug assignment, reaching prediction accuracies from 50 % to 89 % when large training sets are used. In addition, we show how old training data can decrease the prediction accuracy of bug assignment. We advice industry to use SG for bug assignment in proprietary contexts, using at least 2,000 bug reports for training. Finally, we highlight the importance of not solely relying on results from cross-validation when evaluating automated bug assignment.
引用
收藏
页码:1533 / 1578
页数:46
相关论文
共 72 条
[1]   Achieving quality in open source software [J].
Aberdour, Mark .
IEEE SOFTWARE, 2007, 24 (01) :58-+
[2]   Automatic Software Bug Triage System (BTS) Based on Latent Semantic Indexing and Support Vector Machine [J].
Ahsan, Syed Nadeem ;
Ferzund, Javed ;
Wotawa, Franz .
2009 FOURTH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING ADVANCES (ICSEA 2009), 2009, :216-221
[3]  
Alenezi M, 2013, J SOFTW, V8
[4]  
Alshammari R, 2009, IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN CYBER SECURITY, P167
[5]  
Amamra A, 2012, COMM COM INF SC, V340, P131
[6]  
[Anonymous], EXPT SOFTWARE ENG PR
[7]  
[Anonymous], 2010, P 2010 ACM IEEE INT, DOI DOI 10.1145/1852786.1852814
[8]  
[Anonymous], 2002, MALLET: A machine learning for language toolkit
[9]  
[Anonymous], 2011, P INT WORKSH MACH LE
[10]  
[Anonymous], DATA MINING USE CASE