Duplicate Bug Report Detection: How Far Are We?

被引:6
|
作者
Zhang, Ting [1 ]
Han, Donggyun [2 ]
Vinayakarao, Venkatesh [3 ]
Irsan, Ivana Clairine [1 ]
Xu, Bowen [1 ]
Thung, Ferdian [1 ]
Lo, David [1 ]
Jiang, Lingxiao [1 ]
机构
[1] Singapore Management Univ, Singapore, Singapore
[2] Royal Holloway Univ London, London, England
[3] Chennai Math Inst, Chennai, Tamil Nadu, India
基金
新加坡国家研究基金会;
关键词
Bug reports; duplicate bug report detection; deep learning; empirical study; PERFORMANCE;
D O I
10.1145/3576042
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many Duplicate Bug Report Detection (DBRD) techniques have been proposed in the research literature. The industry uses some other techniques. Unfortunately, there is insufficient comparison among them, and it is unclear how far we have been. This work fills this gap by comparing the aforementioned techniques. To compare them, we first need a benchmark that can estimate how a tool would perform if applied in a realistic setting today. Thus, we first investigated potential biases that affect the fair comparison of the accuracy of DBRD techniques. Our experiments suggest that data age and issue tracking system (ITS) choice cause a significant difference. Based on these findings, we prepared a new benchmark. We then used it to evaluate DBRD techniques to estimate better how far we have been. Surprisingly, a simpler technique outperforms recently proposed sophisticated techniques on most projects in our benchmark. In addition, we compared the DBRD techniques proposed in research with those used in Mozilla and VSCode. Surprisingly, we observe that a simple technique already adopted in practice can achieve comparable results as a recently proposed research tool. Our study gives reflections on the current state of DBRD, and we share our insights to benefit future DBRD research.
引用
收藏
页数:32
相关论文
共 50 条
  • [1] A Systematic Study of Duplicate Bug Report Detection
    Gupta, Som
    Gupta, Sanjai Kumar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (01) : 578 - 589
  • [2] Duplicate Bug Report Detection and Classification System Based on Deep Learning Technique
    Kukkar, Ashima
    Mohana, Rajni
    Kumar, Yugal
    Nayyar, Anand
    Bilal, Muhammad
    Kwak, Kyung-Sup
    IEEE ACCESS, 2020, 8 (08): : 200749 - 200763
  • [3] Duplicate Bug Report detection using Named Entity Recognition
    Zheng, Wei
    Li, Yunfan
    Wu, Xiaoxue
    Cheng, Jingyuan
    KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [4] Does Deep Learning improve the performance of duplicate bug report detection? An empirical study?
    Jiang, Yuan
    Su, Xiaohong
    Treude, Christoph
    Shang, Chao
    Wang, Tiantian
    JOURNAL OF SYSTEMS AND SOFTWARE, 2023, 198
  • [5] Exploring the Role of Automation in Duplicate Bug Report Detection: An Industrial Case Study
    Gotharsson, Malte
    Stahre, Karl
    Gay, Gregory
    Neto, Francisco Gomes de Oliveira
    PROCEEDINGS OF THE 2024 IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATION OF SOFTWARE TEST, AST 2024, 2024, : 193 - 203
  • [6] Duplicate Bug Report Detection by Using Sentence Embedding and Fine-tuning
    Isotani, Haruna
    Washizaki, Hironori
    Fukazawa, Yoshiaki
    Nomoto, Tsutomu
    Ouji, Saori
    Saito, Shinobu
    2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2021), 2021, : 535 - 544
  • [7] Improving Bug Reporting, Duplicate Detection, and Localization
    Chaparro, Oscar
    PROCEEDINGS OF THE 2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C 2017), 2017, : 421 - 424
  • [8] Duplicate Bug Report Detection Using an Attention-Based Neural Language Model
    Ben Messaoud, Montassar
    Miladi, Asma
    Jenhani, Ilyes
    Mkaouer, Mohamed Wiem
    Ghadhab, Lobna
    IEEE TRANSACTIONS ON RELIABILITY, 2023, 72 (02) : 846 - 858
  • [9] Duplicate Bug Report Detection Using Dual-Channel Convolutional Neural Networks
    He, Jianjun
    Xu, Ling
    Yan, Meng
    Xia, Xin
    Lei, Yan
    2020 IEEE/ACM 28TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2020, : 117 - 127
  • [10] Efficient feature extraction model for validation performance improvement of duplicate bug report detection in software bug triage systems
    Neysiani, Behzad Soleimani
    Babamir, Seyed Morteza
    Aritsugi, Masayoshi
    INFORMATION AND SOFTWARE TECHNOLOGY, 2020, 126