Towards Training Set Reduction for Bug Triage

被引:23
作者
Zou, Weiqin [1 ]
Hu, Yan [1 ]
Xuan, Jifeng [2 ]
Jiang, He [1 ]
机构
[1] Dalian Univ Technol, Sch Software, Dalian, Peoples R China
[2] Dalian Univ Technol, Sch Mat Sci, Dalian, Peoples R China
来源
2011 35TH IEEE ANNUAL INTERNATIONAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC) | 2011年
基金
新加坡国家研究基金会;
关键词
bug triage; training set reduction; feature selection; instance selection; software quality; NATURAL-LANGUAGE;
D O I
10.1109/COMPSAC.2011.80
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Bug triage is an important step in the process of bug fixing. The goal of bug triage is to assign a new-coming bug to the correct potential developer. The existing bug triage approaches are based on machine learning algorithms, which build classifiers from the training sets of bug reports. In practice, these approaches suffer from the large-scale and low-quality training sets. In this paper, we propose the training set reduction with both feature selection and instance selection techniques for bug triage. We combine feature selection with instance selection to improve the accuracy of bug triage. The feature selection algorithm. chi(2)-test, instance selection algorithm Iterative Case Filter, and their combinations are studied in this paper. We evaluate the training set reduction on the bug data of Eclipse. For the training set, 70% words and 50% bug reports are removed after the training set reduction. The experimental results show that the new and small training sets can provide better accuracy than the original one.
引用
收藏
页码:576 / 581
页数:6
相关论文
共 16 条
  • [1] [Anonymous], 1997, ICML
  • [2] [Anonymous], 1996, Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering
  • [3] Anvik J., 2006, 28th International Conference on Software Engineering Proceedings, P937, DOI 10.1145/1134285.1134457
  • [4] Bettenburg N., 2008, P 16 ACM SIGSOFT INT, P308
  • [5] Duplicate Bug Reports Considered Harmful ... Really?
    Bettenburg, Nicolas
    Premraj, Rahul
    Zimmermann, Thomas
    Kim, Sunghun
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, 2008, : 337 - 345
  • [6] Bhattacharya P., 2010, P 2010 IEEE INT C SO, P1, DOI DOI 10.1109/ICSM.2010.5609736
  • [7] Advances in instance selection for instance-based learning algorithms
    Brighton, H
    Mellish, C
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2002, 6 (02) : 153 - 172
  • [8] Grochowski M, 2004, LECT NOTES ARTIF INT, V3070, P580
  • [9] Improving Bug Triage with Bug Tossing Graphs
    Jeong, Gaeul
    Kim, Sunghun
    Zimmerman, Thomas
    [J]. 7TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2009, : 111 - 120
  • [10] Assigning Bug Reports using a Vocabulary-Based Expertise Model of Developers
    Matter, Dominique
    Kuhn, Adrian
    Nierstrasz, Oscar
    [J]. 2009 6TH IEEE INTERNATIONAL WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES, 2009, : 131 - 140