Towards Word Embeddings for Improved Duplicate Bug Report Retrieval in Software Repositories

被引:7
作者
Budhiraja, Amar [1 ]
Dutta, Kartik [1 ]
Shrivastava, Manish [1 ]
Reddy, Raghu [1 ]
机构
[1] IIIT Hyderabad, Hyderabad, India
来源
PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18) | 2018年
关键词
D O I
10.1145/3234944.3234949
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key part of software maintenance is bug reporting and rectification. Bug reporting is a major issue and due to its asynchronous nature, duplicate bug reporting is common. Detecting duplicate bug reports is an important task in software maintenance in order to avoid the assignment of the same bug to different developers. In this paper, we explore the notion of using word embeddings for retrieving duplicate bug report in large software repositories. We discuss an approach to model each bug report as a dense vector and retrieve its top-k most similar reports for duplicate bug report detection. Through experiments on two real world datasets, we show that word embeddings perform better than baselines and related approaches and have the potential to improve duplicate bug report retrieval.
引用
收藏
页码:167 / 170
页数:4
相关论文
共 11 条
  • [1] Alipour A, 2013, IEEE WORK CONF MIN S, P183, DOI 10.1109/MSR.2013.6624026
  • [2] [Anonymous], 2013, EFFICIENT ESTIMATION
  • [3] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [4] Bojanowski P, 2017, Transactions of the Association for Computational Linguistics, V5, P135, DOI [10.1162/tacla00051, DOI 10.1162/TACLA00051, 10.1162/tacl_a_00051]
  • [5] Hiew Lyndon, 2006, Assisted detection of duplicate bug reports
  • [6] Lazar Alina, 2014, 11 WORKING C MINING, P392, DOI [DOI 10.1145/2597073.2597128, 10.1145/2597073.2597128]
  • [7] Le Q., 2014, DISTRIBUTED REPRESEN, DOI DOI 10.1145/2740908.2742760
  • [8] Runeson P, 2007, PROC INT CONF SOFTW, P499
  • [9] Detecting Duplicate Bug Report Using Character N-Gram-Based Features
    Sureka, Ashish
    Jalote, Pankaj
    [J]. 17TH ASIA PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2010), 2010, : 366 - 374
  • [10] Combining Word Embedding with Information Retrieval to Recommend Similar Bug Reports
    Yang, Xinli
    Lo, David
    Xia, Xin
    Bao, Lingfeng
    Sun, Jianling
    [J]. 2016 IEEE 27TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2016, : 127 - 137