Static analysis of source code security: Assessment of tools against SAMATE tests

被引：35

作者：

Diaz, Gabriel ^{[1
]}

Ramon Bermejo, Juan ^{[2
,3
]}

机构：

[1] Spanish Distance Univ, UNED, Elect Elect & Control Engn Dept, Madrid 28040, Spain

[2] Base Aerea Torrejon de Ardoz, Commun & Comp Squadron Core Command, Madrid 22800, Spain

[3] Base Aerea Torrejon de Ardoz, Control Grp Spanish Air Forces Air Def Syst, Madrid 22800, Spain

来源：

INFORMATION AND SOFTWARE TECHNOLOGY | 2013年 / 55卷 / 08期

关键词：

Security tools; Vulnerability; Quality analysis and evaluation; Software/program verification; Security development lifecycle;

D O I：

10.1016/j.infsof.2013.02.005

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Context: Static analysis tools are used to discover security vulnerabilities in source code. They suffer from false negatives and false positives. A false positive is a reported vulnerability in a program that is not really a security problem. A false negative is a vulnerability in the code which is not detected by the tool. Objective: The main goal of this article is to provide objective assessment results following a well-defined and repeatable methodology that analyzes the performance detecting security vulnerabilities of static analysis tools. The study compares the performance of nine tools (CBMC, K8-Insight, PC-lint, Prevent, Satabs, SCA, Goanna, Cx-enterprise, Codesonar), most of them commercials tools, having a different design. Method: We executed the static analysis tools against SAMATE Reference Dataset test suites 45 and 46 for C language. One includes test cases with known vulnerabilities and the other one is designed with specific vulnerabilities fixed. Afterwards, the results are analyzed by using a set of well known metrics. Results: Only SCA is designed to detect all vulnerabilities considered in SAMATE. None of the tools detect "cross-site scripting" vulnerabilities. The best results for F-measure metric are obtained by Prevent, SCA and K8-Insight. The average precision for analyzed tools is 0.7 and the average recall is 0.527. The differences between all tools are relevant, detecting different kinds of vulnerabilities. Conclusions: The results provide empirical evidences that support popular propositions not objectively demonstrated until now. The methodology is repeatable and allows ranking strictly the analyzed static analysis tools, in terms of vulnerabilities coverage and effectiveness for detecting the highest number of vulnerabilities having few false positives. Its use can help practitioners to select appropriate tools for a security review process of code. We propose some recommendations for improving the reliability and usefulness of static analysis tools and the process of benchmarking. (C) 2013 Elsevier B.V. All rights reserved.

引用

页码：1462 / 1476

页数：15

共 59 条

[1]

Aggarwal A, 2006, P INT COMP SOFTW APP, P343

[2]

Aiken A., 2006, SATURN PROGRAM ANAL

[3]

[Anonymous], IEEE SOFTWARE

[4]

[Anonymous], P INT S SOFTW REL EN

[5]

[Anonymous], BUILT SEC SOURC COD

[6]

[Anonymous], 7755 NIST

[7]

[Anonymous], LECT NOTES COMPUTER

[8]

[Anonymous], 16 ANN C COMP SEC AP

[9]

[Anonymous], LINUXJOURNAL

[10]

[Anonymous], 6 WORLD C INT DES PR

← 1 2 3 4 5 6 →