DeepBugs: A learning approach to name-based bug detection

被引:191
|
作者
Pradel M. [1 ]
Sen K. [2 ,3 ]
机构
[1] TU Darmstadt, Department of Computer Science
[2] University of California, Berkeley
关键词
Bug detection; !text type='Java']Java[!/text]Script; Machine learning; Name-based program analysis; Natural language;
D O I
10.1145/3276517
中图分类号
学科分类号
摘要
Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing name-based bug detection approaches reason about names on a syntactic level and rely on manually designed and tuned algorithms to detect bugs. This paper presents DeepBugs, a learning approach to name-based bug detection, which reasons about names based on a semantic representation and which automatically learns bug detectors instead of manually writing them. We formulate bug detection as a binary classification problem and train a classifier that distinguishes correct from incorrect code. To address the challenge that effectively learning a bug detector requires examples of both correct and incorrect code, we create likely incorrect code examples from an existing corpus of code through simple code transformations. A novel insight learned from our work is that learning from artificially seeded bugs yields bug detectors that are effective at finding bugs in real-world code. We implement our idea into a framework for learning-based and name-based bug detection. Three bug detectors built on top of the framework detect accidentally swapped function arguments, incorrect binary operators, and incorrect operands in binary operations. Applying the approach to a corpus of 150,000 JavaScript files yields bug detectors that have a high accuracy (between 89% and 95%), are very efficient (less than 20 milliseconds per analyzed file), and reveal 102 programming mistakes (with 68% true positive rate) in real-world code. © 2018 Copyright held by the owner/author(s).
引用
收藏
相关论文
共 50 条
  • [31] A Generic Machine Learning based Approach for Addressee Detection in Multiparty Interaction
    Malik, Usman
    Barange, Mukesh
    Ghannad, Naser
    Saunier, Julien
    Pauchet, Alexandre
    PROCEEDINGS OF THE 19TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA' 19), 2019, : 119 - 126
  • [32] An ensemble learning based approach for impression fraud detection in mobile advertising
    Haider, Ch Md Rakin
    Iqbal, Anindya
    Rahman, Atif Hasan
    Rahman, M. Sohel
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2018, 112 : 126 - 141
  • [33] A machine learning-based approach for mercury detection in marine waters
    Piccialli, Francesco
    Giampaolo, Fabio
    Di Cola, Vincenzo Schiano
    Gatta, Federico
    Chiaro, Diletta
    Prezioso, Edoardo
    Izzo, Stefano
    Cuomo, Salvatore
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 527 - 536
  • [34] Botnet Detection Approach Using Graph-Based Machine Learning
    Alharbi, Afnan
    Alsubhi, Khalid
    IEEE ACCESS, 2021, 9 (09): : 99166 - 99180
  • [35] A deep learning approach for host-based cryptojacking malware detection
    Sanda, Olanrewaju
    Pavlidis, Michalis
    Polatidis, Nikolaos
    EVOLVING SYSTEMS, 2024, 15 (01) : 41 - 56
  • [36] A machine learning approach for hypertension detection based on photoplethysmography and clinical data
    Martinez-Rios, Erick
    Montesinos, Luis
    Alfaro-Ponce, Mariel
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 145
  • [37] A Machine Learning Approach for Fall Detection Based on the Instantaneous Doppler Frequency
    Chelli, Ali
    Patzold, Matthias
    IEEE ACCESS, 2019, 7 : 166173 - 166189
  • [38] Hybrid Deep Learning Approach Based on LSTM and CNN for Malware Detection
    Thakur, Preeti
    Kansal, Vineet
    Rishiwal, Vinay
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 136 (03) : 1879 - 1901
  • [39] Machine Learning Approach Based on Hybrid Features for Detection of Phishing URLs
    Ghimire, Awishkar
    Jha, Avinash Kumar
    Thapa, Surendrahikram
    Mishra, Sushruti
    Jha, Aryan Mani
    2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 954 - 959
  • [40] A machine learning based approach for phishing detection using hyperlinks information
    Ankit Kumar Jain
    B. B. Gupta
    Journal of Ambient Intelligence and Humanized Computing, 2019, 10 : 2015 - 2028