DeepBugs: A Learning Approach to Name-Based Bug Detection

被引:0
作者
Pradel, Michael [1 ]
Sen, Koushik [2 ]
机构
[1] Tech Univ Darmstadt, Dept Comp Sci, Darmstadt, Germany
[2] Univ Calif Berkeley, EECS Dept, Berkeley, CA 94720 USA
来源
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL | 2018年 / 2卷
关键词
Bug detection; Natural language; Machine learning; Name-based program analysis; !text type='Java']Java[!/text]Script;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing name-based bug detection approaches reason about names on a syntactic level and rely on manually designed and tuned algorithms to detect bugs. This paper presents DeepBugs, a learning approach to name-based bug detection, which reasons about names based on a semantic representation and which automatically learns bug detectors instead of manually writing them. We formulate bug detection as a binary classification problem and train a classifier that distinguishes correct from incorrect code. To address the challenge that effectively learning a bug detector requires examples of both correct and incorrect code, we create likely incorrect code examples from an existing corpus of code through simple code transformations. A novel insight learned from our work is that learning from artificially seeded bugs yields bug detectors that are effective at finding bugs in real-world code. We implement our idea into a framework for learning-based and name-based bug detection. Three bug detectors built on top of the framework detect accidentally swapped function arguments, incorrect binary operators, and incorrect operands in binary operations. Applying the approach to a corpus of 150,000 JavaScript files yields bug detectors that have a high accuracy (between 89% and 95%), are very efficient (less than 20 milliseconds per analyzed file), and reveal 102 programming mistakes (with 68% true positive rate) in real-world code.
引用
收藏
页数:25
相关论文
共 63 条
[1]  
Aftandilian E., 2012, 2012 12th IEEE Working Conference on Source Code Analysis and Manipulation (SCAM 2012), P14, DOI 10.1109/SCAM.2012.28
[2]  
Allamanis M., 2017, ARXIV PREPRINT ARXIV
[3]  
Allamanis M, 2016, PR MACH LEARN RES, V48
[4]   Learning Natural Coding Conventions [J].
Allamanis, Miltiadis ;
Barr, Earl T. ;
Bird, Christian ;
Sutton, Charles .
22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, :281-293
[5]  
Allamanis Miltiadis, 2017, ABS170507867 CORR
[6]  
Alon U, 2018, ACM SIGPLAN NOTICES, V53, P404, DOI [10.1145/3296979.3192412, 10.1145/3192366.3192412]
[7]   Mining specifications [J].
Ammons, G ;
Bodík, R ;
Larus, JR .
ACM SIGPLAN NOTICES, 2002, 37 (01) :4-16
[8]  
Amodio M., 2017, ARXIVCSAI170509231
[9]   A Survey of Dynamic Analysis and Test Generation for Java']JavaScript [J].
Andreasen, Esben ;
Gong, Liang ;
Moller, Anders ;
Pradel, Michael ;
Selakovic, Marija ;
Sen, Koushik ;
Staicu, Cristian-Alexandru .
ACM COMPUTING SURVEYS, 2017, 50 (05)
[10]   Graph-based Statistical Language Model for Code [J].
Anh Tuan Nguyen ;
Nguyen, Tien N. .
2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 1, 2015, :858-868