Combining Text Mining and Data Mining for Bug Report Classification

被引:36
作者
Zhou, Yu [1 ,2 ]
Tong, Yanxiang [1 ]
Gu, Ruihang [1 ]
Gall, Harald [3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
[3] Univ Zurich, Dept Informat, CH-8006 Zurich, Switzerland
来源
2014 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME) | 2014年
关键词
FAULTS;
D O I
10.1109/ICSME.2014.53
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Misclassification of bug reports inevitably sacrifices the performance of bug prediction models. Manual examinations can help reduce the noise but bring a heavy burden for developers instead. In this paper, we propose a hybrid approach by combining both text mining and data mining techniques of bug report data to automate the prediction process. The first stage leverages text mining techniques to analyze the summary parts of bug reports and classifies them into three levels of probability. The extracted features and some other structured features of bug reports are then fed into the machine learner in the second stage. Data grafting techniques are employed to bridge the two stages. Comparative experiments with previous studies on the same data-three large-scale open source projects-consistently achieve a reasonable enhancement (from 77.4% to 81.7%, 73.9% to 80.2% and 87.4% to 93.7%, respectively) over their best results in terms of overall performance. Additional comparative empirical experiments on other two popular open source repositories confirm the findings and demonstrate the benefits of our approach.
引用
收藏
页码:311 / 320
页数:10
相关论文
共 44 条
[1]   On Predicting the Time taken to Correct Bug Reports in Open Source Projects [J].
Anbalagan, Prasanth ;
Vouk, Mladen .
2009 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, CONFERENCE PROCEEDINGS, 2009, :523-526
[2]  
[Anonymous], 2006, PATTERN RECOGN, DOI DOI 10.1117/1.2819119
[3]  
[Anonymous], SEKE 2004
[4]  
[Anonymous], 2008, INTRO INFORM RETRIEV, DOI DOI 10.1017/CBO9780511809071
[5]  
Antoniol Giuliano, 2008, P 2008 C CTR ADV STU
[6]  
Anvik J., 2006, P 28 INT C SOFTW ENG, P361, DOI [DOI 10.1145/1134285.1134336, 10.1145/1134285.1134336]
[7]  
Bettenburg N., 2008, P 16 ACM SIGSOFT INT, P308
[8]  
Bettenburg N., 2008, P 2008 INT WORKING C, P27
[9]   Duplicate Bug Reports Considered Harmful ... Really? [J].
Bettenburg, Nicolas ;
Premraj, Rahul ;
Zimmermann, Thomas ;
Kim, Sunghun .
2008 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, 2008, :337-345
[10]  
Chaturvedi K.K., 2012, 2012 CSI 6 INT C SOF, P1