A critical comparison on six static analysis tools: Detection, agreement, and precision?

被引:14
作者
Lenarduzzi, Valentina [1 ]
Pecorelli, Fabiano [2 ]
Saarimaki, Nyyti [2 ]
Lujan, Savanna [2 ]
Palomba, Fabio [3 ]
机构
[1] Univ Oulu, Res Unit M3S, Oulu, Finland
[2] Tampere Univ, Clowee Res Grp, Tampere, Finland
[3] Univ Salerno, SeSa Lab, Salerno, Italy
基金
瑞士国家科学基金会;
关键词
Static analysis tools; Software quality; Empirical study; METRIC THRESHOLDS; BUG PREDICTION;
D O I
10.1016/j.jss.2022.111575
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Background: Developers use Static Analysis Tools (SATs) to control for potential quality issues in source code, including defects and technical debt. Tool vendors have devised quite a number of tools, which makes it harder for practitioners to select the most suitable one for their needs. To better support developers, researchers have been conducting several studies on SATs to favor the understanding of their actual capabilities.Aims: Despite the work done so far, there is still a lack of knowledge regarding (1) what is their agreement, and (2) what is the precision of their recommendations. We aim at bridging this gap by proposing a large-scale comparison of six popular SATs for Java projects: Better Code Hub, CheckStyle, Coverity Scan, FindBugs, PMD, and SonarQube.Methods: We analyze 47 Java projects applying 6 SATs. To assess their agreement, we compared them by manually analyzing - at line - and class-level - whether they identify the same issues. Finally, we evaluate the precision of the tools against a manually-defined ground truth.Results: The key results show little to no agreement among the tools and a low degree of precision.Conclusion: Our study provides the first overview on the agreement among different tools as well as an extensive analysis of their precision that can be used by researchers, practitioners, and tool vendors to map the current capabilities of the tools and envision possible improvements.(c) 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:19
相关论文
共 66 条
[21]   How Do Developers Act on Static Analysis Alerts? An Empirical Study of Coverity Usage [J].
Imtiaz, Nasif ;
Murphy, Brendan ;
Williams, Laurie .
2019 IEEE 30TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2019, :323-333
[22]  
Johnson B, 2013, PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), P672, DOI 10.1109/ICSE.2013.6606613
[23]  
Kong D., 2007, INT C SCAL INF SYST
[24]  
Lenarduzzi Valentina, 2020, Proceedings of 6th International Conference in Software Engineering for Defence Applications (SEDA 2018). Advances in Intelligent Systems and Computing (AISC 925), P165, DOI 10.1007/978-3-030-14687-0_15
[25]  
Lenarduzzi V., INT WORKSHOP MACHINE, P37
[26]  
Lenarduzzi V., 2019, INT C SOFTW AN EV RE
[27]   Some SonarQube issues have a significant but small effect on faults and changes. A large-scale empirical study [J].
Lenarduzzi, Valentina ;
Saarimaki, Nyyti ;
Taibi, Davide .
JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 170
[28]   Why Do Developers Adopt Open Source Software? Past, Present and Future [J].
Lenarduzzi, Valentina ;
Tosi, Davide ;
Lavazza, Luigi ;
Morasca, Sandro .
OPEN SOURCE SYSTEMS, OSS 2019, 2019, 556 :104-115
[29]   How far are we from reproducible research on code smell detection? A systematic literature review [J].
Lewowski, Tomasz ;
Madeyski, Lech .
INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 144
[30]  
Liang Guangtai, 2010, P 25 INT C AUT SOFTW, P93, DOI [10.1145/1858996.1859013, DOI 10.1145/1858996.1859013]