Detecting False Alarms from Automatic Static Analysis Tools: How Far are We?

被引:22
作者
Kang, Hong Jin [1 ]
Aw, Khai Loong [1 ]
Lo, David [1 ]
机构
[1] Singapore Management Univ, Singapore, Singapore
来源
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022) | 2022年
基金
新加坡国家研究基金会;
关键词
static analysis; false alarms; data leakage; data duplication; BUGS;
D O I
10.1145/3510003.3510214
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automatic static analysis tools (ASATs), such as Findbugs, have a high false alarm rate. The large number of false alarms produced poses a barrier to adoption. Researchers have proposed the use of machine learning to prune false alarms and present only actionable warnings to developers. The state-of-the-art study has identified a set of "Golden Features" based on metrics computed over the characteristics and history of the file, code, and warning. Recent studies show that machine learning using these features is extremely effective and that they achieve almost perfect performance. We perform a detailed analysis to better understand the strong performance of the "Golden Features". We found that several studies used an experimental procedure that results in data leakage and data duplication, which are subtle issues with significant implications. Firstly, the ground-truth labels have leaked into features that measure the proportion of actionable warnings in a given context. Secondly, many warnings in the testing dataset appear in the training dataset. Next, we demonstrate limitations in the warning oracle that determines the ground-truth labels, a heuristic comparing warnings in a given revision to a reference revision in the future. We show the choice of reference revision influences the warning distribution. Moreover, the heuristic produces labels that do not agree with human oracles. Hence, the strong performance of these techniques previously seen is overoptimistic of their true performance if adopted in practice. Our results convey several lessons and provide guidelines for evaluating false alarm detectors.
引用
收藏
页码:698 / 709
页数:12
相关论文
共 64 条
[1]   The Adverse Effects of Code Duplication in Machine Learning Models of Code [J].
Allamams, Miltiadis .
PROCEEDINGS OF THE 2019 ACM SIGPLAN INTERNATIONAL SYMPOSIUM ON NEW IDEAS, NEW PARADIGMS, AND REFLECTIONS ON PROGRAMMING AND SOFTWARE (ONWARD!' 19), 2019, :143-153
[2]  
[Anonymous], 2021, FINDBUGS FILTER FILE
[3]  
[Anonymous], Replication Package for "An Exploratory Study on God Header Files in Open-Source C Projects
[4]  
Arong, 2014, PROCEEDINGS OF 2014 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), P51, DOI 10.1109/PIC.2014.6972294
[5]   Using static analysis to find bugs [J].
Ayewah, Nathaniel ;
Pugh, William ;
Hovemeyer, David ;
Morgenthaler, J. David ;
Penix, John .
IEEE SOFTWARE, 2008, 25 (05) :22-29
[6]  
Ayewah Nathaniel, 2010, P 19 INT S SOFTWARE, P241, DOI DOI 10.1145/1831708.1831738
[7]  
Balachandran V, 2013, PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), P931, DOI 10.1109/ICSE.2013.6606642
[8]   Analyzing the State of Static Analysis: A Large-Scale Evaluation in Open Source Software [J].
Beller, Moritz ;
Bholanath, Radjino ;
McIntosh, Shane ;
Zaidman, Andy .
2016 IEEE 23RD INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), VOL 1, 2016, :470-481
[9]   Scaling Static Analyses at Facebook [J].
Distefano, Dino ;
Fahndrich, Manuel ;
Logozzo, Francesco ;
O'Hearn, Peter W. .
COMMUNICATIONS OF THE ACM, 2019, 62 (08) :62-70
[10]   Easy over Hard: A Case Study on Deep Learning [J].
Fu, Wei ;
Menzies, Tim .
ESEC/FSE 2017: PROCEEDINGS OF THE 2017 11TH JOINT MEETING ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2017, :49-60