A Language-agnostic Framework for Mining Static Analysis Rules from Code Changes

被引：1

作者：

Effendi, Sedick David Baker ^{[1
]}

Cirisci, Berk ^{[2
]}

Mukherjee, Rajdeep ^{[3
]}

Hoan Anh Nguyen ^{[3
]}

Tripp, Omer ^{[3
]}

机构：

[1] Stellenbosch Univ, Stellenbosch, South Africa

[2] CNRS, IRIF, Paris, France

[3] Amazon Web Serv, Seattle, WA USA

来源：

2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE, ICSE-SEIP | 2023年

关键词：

static analysis; mining software repository; program synthesis; coding best practices; clustering; GRAPH;

D O I：

10.1109/ICSE-SEIP58684.2023.00035

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Static analysis tools detect a wide range of code defects, including code quality issues, security vulnerabilities, operational risks, and best-practice violations. Creating and maintaining a set of high-quality static analysis rules that detect misuses of popular libraries and SDKs across multiple languages is challenging. One of the mechanisms for inferring static analysis rules is by leveraging frequently occurring bug-fix code changes in the wild that are committed by multiple developers and into different software repositories. The intuition is that code changes following a common pattern correspond to recurring mistakes, from which deriving best practices could likely be of high value and accepted by the community. Automating the process of mining and clustering code changes enables a scalable mechanism to source and generate bestpractices rules. From a coverage standpoint, the rules are derived from real-world code changes, which ensures that popular libraries and application domains are accounted for. In this paper, we present a language-agnostic framework for mining and clustering code changes from software repositories using a graph-based representation dubbed MU (mu). Unlike language-specific ASTs, the MU representation generalizes across languages by modeling programs at a higher semantic level, which enables grouping of code changes that are semantically similar yet syntactically distinct. We have mined a total of 62 high-quality static analysis rules across Java, JavaScript, and Python from less than 600 code change clusters. These cover multiple libraries, including the AWS Java and Python SDKs, as well as libraries like pandas, React, Android libraries, Json parsing libraries, and many more. These rules are integrated into a cloud-based static analyzer, Amazon CodeGuru Reviewer. Developers have accepted 73% of recommendations from these rules during code review, which signifies the value of these rules to help improve developer productivity, make code secure, and improve code hygiene.

引用

页码：327 / 339

页数：13

共 66 条

[1] Aho AV, 1994, The Design and Analysis of Computer Algorithms, V1st
[2] A Systematic Evaluation of Static API-Misuse Detectors
Amann, Sven
Hoan Anh Nguyen
Nadi, Sarah
Nguyen, Tien N.
Mezini, Mira
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2019, 45 (12) : 1170 - 1188
[3] androidx, CODEGURU RUL MISS NU
[4] Babai L., 1983, P 15 ANN ACM S THEOR, P171, DOI [DOI 10.1145/800061, 10.1145/800061.808746, DOI 10.1145/800061.808746]
[5] Graph Isomorphism in Quasipolynomial Time [Extended Abstract]
Babai, Laszlo
[J]. STOC'16: PROCEEDINGS OF THE 48TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2016, : 684 - 697
[6] Getafix: Learning to Fix Bugs Automatically
Bader, Johannes
Scott, Andrew
Pradel, Michael
Chandra, Satish
[J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (OOPSLA):
[7] The Plastic Surgery Hypothesis
Barr, Earl T.
Brun, Yuriy
Devanbu, Premkumar
Harman, Mark
Sarro, Federica
[J]. 22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, : 306 - 317
[8] Belotti P., 2014, ROBUST OPTIMIZATION
[9] cwe, CODEGURU RUL CLEART
[10] cwe, CODEGURU RUL IN ENCR

← 1 2 3 4 5 6 7 →