Scaffle: Bug Localization on Millions of Files

被引:30
作者
Pradel, Michael [1 ,2 ]
Murali, Vijayaraghavan [2 ]
Qian, Rebecca [2 ]
Machalica, Mateusz [2 ]
Meijer, Erik [2 ]
Chandra, Satish [2 ]
机构
[1] Univ Stuttgart, Stuttgart, Germany
[2] Facebook, Menlo Pk, CA USA
来源
PROCEEDINGS OF THE 29TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2020 | 2020年
基金
欧洲研究理事会;
关键词
Bug localization; software crashes; machine learning;
D O I
10.1145/3395363.3397356
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite all efforts to avoid bugs, software sometimes crashes in the field, leaving crash traces as the only information to localize the problem. Prior approaches on localizing where to fix the root cause of a crash do not scale well to ultra-large scale, heterogeneous code bases that contain millions of code files written in multiple programming languages. This paper presents Scaffle, the first scalable bug localization technique, which is based on the key insight to divide the problem into two easier sub-problems. First, a trained machine learning model predicts which lines of a raw crash trace are most informative for localizing the bug. Then, these lines are fed to an information retrieval-based search engine to retrieve file paths in the code base, predicting which file to change to address the crash. The approach does not make any assumptions about the format of a crash trace or the language that produces it. We evaluate Scaffle with tens of thousands of crash traces produced by a large-scale industrial code base at Facebook that contains millions of possible bug locations and that powers tools used by billions of people. The results show that the approach correctly predicts the file to fix for 40% to 60% (50% to 70%) of all crash traces within the top-1 (top-5) predictions. Moreover, Scaffle improves over several baseline approaches, including an existing classification-based approach, a scalable variant of existing information retrieval-based approaches, and a set of hand-tuned, industrially deployed heuristics.
引用
收藏
页码:225 / 236
页数:12
相关论文
共 33 条
[1]   On the accuracy of spectrum-based fault localization [J].
Abreu, Rui ;
Zoeteweij, Peter ;
van Gemund, Arjan J. C. .
TAIC PART 2007 - TESTING: ACADEMIC AND INDUSTRIAL CONFERENCE - PRACTICE AND RESEARCH TECHNIQUES, PROCEEDINGS: CO-LOCATED WITH MUTATION 2007, 2007, :89-+
[2]   Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports [J].
An Ngoc Lam ;
Anh Tuan Nguyen ;
Hoan Anh Nguyen ;
Nguyen, Tien N. .
2015 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2015, :476-481
[3]  
Anh Tuan Nguyen, 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering, P263, DOI 10.1109/ASE.2011.6100062
[4]  
Bhagwan R, 2018, PROCEEDINGS OF THE 13TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P493
[5]   Automatically Analyzing Groups of Crashes for Finding Correlations [J].
Castelluccio, Marco ;
Sansone, Carlo ;
Verdoliva, Luisa ;
Poggi, Giovanni .
ESEC/FSE 2017: PROCEEDINGS OF THE 2017 11TH JOINT MEETING ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2017, :717-726
[6]  
Dang YN, 2012, PROC INT CONF SOFTW, P1084, DOI 10.1109/ICSE.2012.6227111
[7]  
Dhaliwal T., 2011, 2011 IEEE 27th International Conference on Software Maintenance, P333, DOI 10.1109/ICSM.2011.6080800
[8]  
Jiang Y, 2015, 2015 INTERNATIONAL SYMPOSIUM ON BIOELECTRONICS AND BIOINFORMATICS (ISBB), P1, DOI 10.1109/ISBB.2015.7344908
[9]  
Jones JA, 2002, ICSE 2002: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, P467, DOI 10.1109/ICSE.2002.1007991
[10]   Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts [J].
Jonsson, Leif ;
Borg, Markus ;
Broman, David ;
Sandahl, Kristian ;
Eldh, Sigrid ;
Runeson, Per .
EMPIRICAL SOFTWARE ENGINEERING, 2016, 21 (04) :1533-1578