A Language-agnostic Framework for Mining Static Analysis Rules from Code Changes

被引:1
作者
Effendi, Sedick David Baker [1 ]
Cirisci, Berk [2 ]
Mukherjee, Rajdeep [3 ]
Hoan Anh Nguyen [3 ]
Tripp, Omer [3 ]
机构
[1] Stellenbosch Univ, Stellenbosch, South Africa
[2] CNRS, IRIF, Paris, France
[3] Amazon Web Serv, Seattle, WA USA
来源
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE, ICSE-SEIP | 2023年
关键词
static analysis; mining software repository; program synthesis; coding best practices; clustering; GRAPH;
D O I
10.1109/ICSE-SEIP58684.2023.00035
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Static analysis tools detect a wide range of code defects, including code quality issues, security vulnerabilities, operational risks, and best-practice violations. Creating and maintaining a set of high-quality static analysis rules that detect misuses of popular libraries and SDKs across multiple languages is challenging. One of the mechanisms for inferring static analysis rules is by leveraging frequently occurring bug-fix code changes in the wild that are committed by multiple developers and into different software repositories. The intuition is that code changes following a common pattern correspond to recurring mistakes, from which deriving best practices could likely be of high value and accepted by the community. Automating the process of mining and clustering code changes enables a scalable mechanism to source and generate bestpractices rules. From a coverage standpoint, the rules are derived from real-world code changes, which ensures that popular libraries and application domains are accounted for. In this paper, we present a language-agnostic framework for mining and clustering code changes from software repositories using a graph-based representation dubbed MU (mu). Unlike language-specific ASTs, the MU representation generalizes across languages by modeling programs at a higher semantic level, which enables grouping of code changes that are semantically similar yet syntactically distinct. We have mined a total of 62 high-quality static analysis rules across Java, JavaScript, and Python from less than 600 code change clusters. These cover multiple libraries, including the AWS Java and Python SDKs, as well as libraries like pandas, React, Android libraries, Json parsing libraries, and many more. These rules are integrated into a cloud-based static analyzer, Amazon CodeGuru Reviewer. Developers have accepted 73% of recommendations from these rules during code review, which signifies the value of these rules to help improve developer productivity, make code secure, and improve code hygiene.
引用
收藏
页码:327 / 339
页数:13
相关论文
共 66 条
  • [1] Aho AV, 1994, The Design and Analysis of Computer Algorithms, V1st
  • [2] A Systematic Evaluation of Static API-Misuse Detectors
    Amann, Sven
    Hoan Anh Nguyen
    Nadi, Sarah
    Nguyen, Tien N.
    Mezini, Mira
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2019, 45 (12) : 1170 - 1188
  • [3] androidx, CODEGURU RUL MISS NU
  • [4] Babai L., 1983, P 15 ANN ACM S THEOR, P171, DOI [DOI 10.1145/800061, 10.1145/800061.808746, DOI 10.1145/800061.808746]
  • [5] Graph Isomorphism in Quasipolynomial Time [Extended Abstract]
    Babai, Laszlo
    [J]. STOC'16: PROCEEDINGS OF THE 48TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2016, : 684 - 697
  • [6] Getafix: Learning to Fix Bugs Automatically
    Bader, Johannes
    Scott, Andrew
    Pradel, Michael
    Chandra, Satish
    [J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (OOPSLA):
  • [7] The Plastic Surgery Hypothesis
    Barr, Earl T.
    Brun, Yuriy
    Devanbu, Premkumar
    Harman, Mark
    Sarro, Federica
    [J]. 22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, : 306 - 317
  • [8] Belotti P., 2014, ROBUST OPTIMIZATION
  • [9] cwe, CODEGURU RUL CLEART
  • [10] cwe, CODEGURU RUL IN ENCR