Pathidea: Improving Information Retrieval-Based Bug Localization by Re-Constructing Execution Paths Using Logs

被引:25
作者
Chen, An Ran [1 ]
Chen, Tse-Hsun [1 ]
Wang, Shaowei [2 ]
机构
[1] Concordia Univ, Software PErformance Anal & Reliabil SPEAR Lab, Montreal, PQ H3G 1M8, Canada
[2] Univ Manitoba, Dept Comp Sci, Winnipeg, MB R3T 2N2, Canada
关键词
Computer bugs; Location awareness; Debugging; Static analysis; Information retrieval; History; Tools; Bug localization; log; bug report; information retrieval;
D O I
10.1109/TSE.2021.3071473
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
To assist developers with debugging and analyzing bug reports, researchers have proposed information retrieval-based bug localization (IRBL) approaches. IRBL approaches leverage the textual information in bug reports as queries to generate a ranked list of potential buggy files that may need further investigation. Although IRBL approaches have shown promising results, most prior research only leverages the textual information that is "visible" in bug reports, such as bug description or title. However, in addition to the textual description of the bug, developers also often attach logs in bug reports. Logs provide important information that can be used to re-construct the system execution paths when an issue happens and assist developers with debugging. In this paper, we propose an IRBL approach, Pathidea, which leverages logs in bug reports to re-construct execution paths and helps improve the results of bug localization. Pathidea uses static analysis to create a file-level call graph, and re-constructs the call paths from the reported logs. We evaluate Pathidea on eight open source systems, with a total of 1,273 bug reports that contain logs. We find that Pathidea achieves a high recall (up to 51.9 percent for Top@5). On average, Pathidea achieves an improvement that varies from 8 to 21 and 5 to 21 percent over BRTracer in terms of Mean Average Precision (MAP) and Mean Reciprocal Rank (MRR) across studied systems, respectively. Moreover, we find that the re-constructed execution paths can also complement other IRBL approaches by providing a 10 and 8 percent improvement in terms of MAP and MRR, respectively. Finally, we conduct a parameter sensitivity analysis and provide recommendations on setting the parameter values when applying Pathidea.
引用
收藏
页码:2905 / 2919
页数:15
相关论文
共 42 条
[1]  
Ajisaka A, CONTRIBUTE APACHE HA
[2]   Bug Localization with Combination of Deep Learning and Information Retrieval [J].
An Ngoc Lam ;
Anh Tuan Nguyen ;
Hoan Anh Nguyen ;
Nguyen, Tien N. .
2017 IEEE/ACM 25TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC), 2017, :218-229
[3]  
[Anonymous], 2005, ACM SIGSOFT SOFTW EN, DOI 10.1145/1083142.1083147
[4]  
Apache, 2020, AAP JIRA
[5]  
Arong, 2014, PROCEEDINGS OF 2014 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), P51, DOI 10.1109/PIC.2014.6972294
[6]  
Bettenburg N., 2008, P 16 ACM SIGSOFT INT, P308
[7]  
Bhagwan R, 2018, PROCEEDINGS OF THE 13TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P493
[8]   Demystifying the challenges and benefits of analyzing user-reported logs in bug reports [J].
Chen, An Ran ;
Chen, Tse-Hsun ;
Wang, Shaowei .
EMPIRICAL SOFTWARE ENGINEERING, 2021, 26 (01)
[9]   An Empirical Study On Leveraging Logs For Debugging Production Failures [J].
Chen, An Ran .
2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2019), 2019, :126-128
[10]   An Automated Approach to Estimating Code Coverage Measures via Execution Logs [J].
Chen, Boyuan ;
Song, Jian ;
Xu, Peng ;
Hu, Xing ;
Jiang, Zhen Ming .
PROCEEDINGS OF THE 2018 33RD IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMTED SOFTWARE ENGINEERING (ASE' 18), 2018, :305-316