An empirical assessment of machine learning approaches for triaging reports of static analysis tools

被引:4
作者
Yerramreddy, Sai [1 ]
Mordahl, Austin [2 ]
Koc, Ugur [1 ]
Wei, Shiyi [2 ]
Foster, Jeffrey S. [3 ]
Carpuat, Marine [1 ]
Porter, Adam A. [1 ]
机构
[1] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[2] Univ Texas Dallas, Dept Comp Sci, Richardson, TX USA
[3] Tufts Univ, Dept Comp Sci, Medford, MA USA
关键词
Static analysis; False positive classification; Machine learning; PROBABILISTIC MODEL; GRAPH;
D O I
10.1007/s10664-022-10253-z
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Despite their ability to detect critical bugs in software, static analysis tools' high false positive rates are a key barrier to their adoption in real-world settings. To improve the usability of these tools, researchers have recently begun to apply machine learning techniques to classify and filter incorrect analysis reports. Although initial results have been promising, the long-term potential and best practices for this line of research are unclear due to the lack of detailed, large-scale empirical evaluation. To partially address this knowledge gap, we present a comparative empirical study of three machine learning techniques & mdash;traditional models, recurrent neural networks (RNNs), and graph neural networks (GNNs)& mdash;for classifying correct and incorrect results in three static analysis tools & mdash;FindSecBugs, CBMC, and JBMC & mdash;using multiple datasets. These tools represent different techniques of static analysis, namely taint analysis and model-checking. We also introduce and evaluate new data preparation routines for RNNs and node representations for GNNs. We find that overall classification accuracy reaches a high of 80%-99% for different datasets and application scenarios. We observe that data preparation routines have a positive impact on classification accuracy, with an improvement of up to 5% for RNNs and 16% for GNNs. Overall, our results suggest that neural networks (RNNs or GNNs) that learn over a program's source code outperform traditional models, although interesting tradeoffs are present among all techniques. Our observations provide insight into the future research needed to speed the adoption of machine learning approaches for static analysis tools in practice.
引用
收藏
页数:44
相关论文
共 104 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] Al-Rfou R., 2016, arXiv, DOI DOI 10.48550/ARXIV.1605.02688
  • [3] Allamanis M, 2018, Arxiv, DOI arXiv:1711.00740
  • [4] A Survey of Machine Learning for Big Code and Naturalness
    Allamanis, Miltiadis
    Barr, Earl T.
    Devanbu, Premkumar
    Sutton, Charles
    [J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
  • [5] Suggesting Accurate Method and Class Names
    Allamanis, Miltiadis
    Barr, Earl T.
    Bird, Christian
    Sutton, Charles
    [J]. 2015 10TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE 2015) PROCEEDINGS, 2015, : 38 - 49
  • [6] code2vec: Learning Distributed Representations of Code
    Alon, Uri
    Zilberstein, Meital
    Levy, Omer
    Yahav, Eran
    [J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL):
  • [7] Andres M., 2013, FREE CHAT SERVER CHA
  • [8] [Anonymous], 1993, C4. 5: Programs for Machine Learning
  • [9] Apollo, 2018, APOLLO DISTRIBUTED C
  • [10] Arteau Ph, 2018, FIND SECURITY BUGS