An extensive replication study of the ABLoTS approach for bug localization

被引:1
作者
Niu, Feifei [1 ]
Zhang, Enshuo [1 ]
Mayr-Dorn, Christoph [2 ]
Assuncao, Wesley Klewerton Guez [3 ]
Huang, Liguo [4 ]
Ge, Jidong [1 ]
Luo, Bin [1 ]
Egyed, Alexander [2 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Johannes Kepler Univ Linz, Inst Software Syst Engn, Linz, Austria
[3] North Carolina State Univ, Dept Comp Sci, Raleigh, NC USA
[4] Southern Methodist Univ, Dept Comp Sci, Dallas, TX USA
基金
奥地利科学基金会;
关键词
Bug localization; Information retrieval; Replication study; Composer; FAULT LOCALIZATION; INFORMATION; CODE;
D O I
10.1007/s10664-024-10537-6
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Bug localization is the task of recommending source code locations (typically files) that contain the cause of a bug and hence need to be changed to fix the bug. Along these lines, information retrieval-based bug localization (IRBL) approaches have been adopted, which identify the most bug-prone files from the source code space. In current practice, a series of state-of-the-art IRBL techniques leverage the combination of different components (e.g., similar reports, version history, and code structure) to achieve better performance. ABLoTS is a recently proposed approach with the core component, TraceScore, that utilizes requirements and traceability information between different issue reports (i.e., feature requests and bug reports) to identify buggy source code snippets with promising results. To evaluate the accuracy of these results and obtain additional insights into the practical applicability of ABLoTS, we conducted a replication study of this approach with the original dataset and also on two extended datasets (i.e., additional Java dataset and Python dataset). The original dataset consists of 11 open source Java projects with 8,494 bug reports. The extended Java dataset includes 16 more projects comprising 25,893 bug reports and corresponding source code commits. The extended Python dataset consists of 12 projects with 1,289 bug reports. While we find that the TraceScore component, which is the core of ABLoTS, produces comparable or even better results with the extended datasets, we also find that we cannot reproduce the ABLoTS results, as reported in its original paper, due to an overlooked side effect of incorrectly choosing a cut-off date that led to test data leaking into training data with significant effects on performance. Additionally, we conduct experiments to assess the performance of various composers that aggregate scores from different components, revealing that Logistic Regression, fixed weight, and CombSUM outperform the other composers across all three datasets, while decision tree and random forest exhibited subpar performance.
引用
收藏
页数:37
相关论文
共 78 条
  • [31] Kim S, 2007, PROC INT CONF SOFTW, P489
  • [32] Le T.-D.B., 2016, P 25 INT S SOFTW TES, P177
  • [33] Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
    Lee, Jaekwon
    Kim, Dongsun
    Bissyande, Tegawende F.
    Jung, Woosung
    Le Traon, Yves
    [J]. ISSTA'18: PROCEEDINGS OF THE 27TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, 2018, : 61 - 72
  • [34] Lewis C, 2013, PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), P372, DOI 10.1109/ICSE.2013.6606583
  • [35] DFix: Automatically Fixing Timing Bugs in Distributed Systems
    Li, Guangpu
    Liu, Haopeng
    Chen, Xianglan
    Gunawi, Haryadi S.
    Lu, Shan
    [J]. PROCEEDINGS OF THE 40TH ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '19), 2019, : 994 - 1009
  • [36] An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects
    Li, Wei
    Li, Qingan
    Ming, Yunlong
    Dai, Weijiao
    Ying, Shi
    Yuan, Mengting
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2022, 27 (02)
  • [37] Bug Localization by Learning to Rank and Represent Bug Inducing Changes
    Loyola, Pablo
    Gajananan, Kugamoorthy
    Satoh, Fumiko
    [J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 657 - 665
  • [38] Lucia D. L., 2014, P 29 ACM IEEE INT C, P127
  • [39] Bug localization using latent Dirichlet allocation
    Lukins, Stacy K.
    Kraft, Nicholas A.
    Etzkorn, Letha H.
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2010, 52 (09) : 972 - 990
  • [40] Manning C.D., 2008, Introduction to Information Retrieval, P1