Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

被引：0

作者：

Tamberg, Karl ^{[1
]}

Bahsi, Hayretdin ^{[1
,2
]}

机构：

[1] Tallinn Univ Technol, Sch Informat Technol, Tallinn 12618, Estonia

[2] No Arizona Univ, Sch Informat Comp & Cyber Syst, Flagstaff, AZ 86011 USA

来源：

IEEE ACCESS | 2025年 / 13卷

关键词：

Benchmarking; large language models; LLM; prompting; software vulnerabilities; static code analyser; TOOLS;

D O I：

10.1109/ACCESS.2025.3541146

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Despite various approaches being employed to detect software vulnerabilities, the number of reported software vulnerabilities shows an upward trend over the years. This suggests the problems are not caught before the code is released, which could be caused by many factors, like lack of awareness, limited efficacy of the existing vulnerability detection tools or the tools not being user-friendly. To help combat some issues with traditional vulnerability detection tools, we propose using large language models (LLMs) to assist in finding vulnerabilities in source code. LLMs have shown a remarkable ability to understand and generate code, underlining their potential in code-related tasks. The aim is to test multiple state-of-the-art LLMs and identify the best prompting strategies, allowing extraction of the best value from the LLMs. We leverage findings from prompting-focused research, benchmarking approaches like chain of thought, tree of thought and self-consistency for vulnerability detection use-cases. We provide an overview of the strengths and weaknesses of the LLM-based approach and compare the results to those of traditional static analysis tools. We find that LLMs can pinpoint more issues than traditional static analysis tools, outperforming traditional tools in terms of recall and F1 scores. However, LLMs are more prone to generate false positive classifications than traditional tools. The experiments are conducted using the Java programming language and the results should benefit software developers and security analysts responsible for ensuring that the code is free of vulnerabilities.

引用

页码：29698 / 29717

页数：20

共 64 条

[11] CodeQL, CWE Coverage for Java and Kotlin
[12] CodeQL, Codeql CLI CSV Output
[13] CWE, Metrics
[14] Edmundson Anne, 2013, Engineering Secure Software and Systems. 5th International Symposium, ESSoS 2013. Proceedings, P197, DOI 10.1007/978-3-642-36563-8_14
[15] Soft precision and recall
Franti, Pasi
Mariescu-Istodor, Radu
[J]. PATTERN RECOGNITION LETTERS, 2023, 167 : 115 - 121
[16] Fu M., 2023, arXiv
[17] Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A Survey
Ghaffarian, Seyed Mohammad
Shahriari, Hamid Reza
[J]. ACM COMPUTING SURVEYS, 2017, 50 (04)
[18] GitHub, About Code Scanning With CodeQL
[19] On the capability of static code analysis to detect security vulnerabilities
Goseva-Popstojanova, Katerina
Perhinschi, Andrei
[J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 68 : 18 - 33
[20] Greiler M., Security Code Review Checklist

← 1 2 3 4 5 6 7 →