Combining Text Mining and Data Mining for Bug Report Classification

被引:36
作者
Zhou, Yu [1 ,2 ]
Tong, Yanxiang [1 ]
Gu, Ruihang [1 ]
Gall, Harald [3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
[3] Univ Zurich, Dept Informat, CH-8006 Zurich, Switzerland
来源
2014 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME) | 2014年
关键词
FAULTS;
D O I
10.1109/ICSME.2014.53
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Misclassification of bug reports inevitably sacrifices the performance of bug prediction models. Manual examinations can help reduce the noise but bring a heavy burden for developers instead. In this paper, we propose a hybrid approach by combining both text mining and data mining techniques of bug report data to automate the prediction process. The first stage leverages text mining techniques to analyze the summary parts of bug reports and classifies them into three levels of probability. The extracted features and some other structured features of bug reports are then fed into the machine learner in the second stage. Data grafting techniques are employed to bridge the two stages. Comparative experiments with previous studies on the same data-three large-scale open source projects-consistently achieve a reasonable enhancement (from 77.4% to 81.7%, 73.9% to 80.2% and 87.4% to 93.7%, respectively) over their best results in terms of overall performance. Additional comparative empirical experiments on other two popular open source repositories confirm the findings and demonstrate the benefits of our approach.
引用
收藏
页码:311 / 320
页数:10
相关论文
共 44 条
[21]  
Lamkanfi Ahmed, 2010, Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), P1, DOI 10.1109/MSR.2010.5463284
[22]  
Lewis D. D., 1998, Machine Learning: ECML-98. 10th European Conference on Machine Learning. Proceedings, P4, DOI 10.1007/BFb0026666
[23]   Assigning Bug Reports using a Vocabulary-Based Expertise Model of Developers [J].
Matter, Dominique ;
Kuhn, Adrian ;
Nierstrasz, Oscar .
2009 6TH IEEE INTERNATIONAL WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES, 2009, :131-140
[24]   Automated Severity Assessment of Software Defect Reports [J].
Menzies, Tim ;
Marcus, Andrian .
2008 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, 2008, :346-+
[25]   Predicting the location and number of faults in large software systems [J].
Ostrand, TJ ;
Weyuker, EJ ;
Bell, RM .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2005, 31 (04) :340-355
[26]   Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling [J].
Pingclasai, Natthakul ;
Hata, Hideaki ;
Matsumoto, Ken-ichi .
2013 20TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2013), VOL 2, 2013, :13-18
[27]  
Prasetyo PK, 2012, 2012 28TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM), P596, DOI 10.1109/ICSM.2012.6405330
[28]  
Runeson P, 2007, PROC INT CONF SOFTW, P499
[29]   Guidelines for conducting and reporting case study research in software engineering [J].
Runeson, Per ;
Hoest, Martin .
EMPIRICAL SOFTWARE ENGINEERING, 2009, 14 (02) :131-164
[30]   Machine learning in automated text categorization [J].
Sebastiani, F .
ACM COMPUTING SURVEYS, 2002, 34 (01) :1-47