Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

被引:0
作者
Tamberg, Karl [1 ]
Bahsi, Hayretdin [1 ,2 ]
机构
[1] Tallinn Univ Technol, Sch Informat Technol, Tallinn 12618, Estonia
[2] No Arizona Univ, Sch Informat Comp & Cyber Syst, Flagstaff, AZ 86011 USA
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Benchmarking; large language models; LLM; prompting; software vulnerabilities; static code analyser; TOOLS;
D O I
10.1109/ACCESS.2025.3541146
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite various approaches being employed to detect software vulnerabilities, the number of reported software vulnerabilities shows an upward trend over the years. This suggests the problems are not caught before the code is released, which could be caused by many factors, like lack of awareness, limited efficacy of the existing vulnerability detection tools or the tools not being user-friendly. To help combat some issues with traditional vulnerability detection tools, we propose using large language models (LLMs) to assist in finding vulnerabilities in source code. LLMs have shown a remarkable ability to understand and generate code, underlining their potential in code-related tasks. The aim is to test multiple state-of-the-art LLMs and identify the best prompting strategies, allowing extraction of the best value from the LLMs. We leverage findings from prompting-focused research, benchmarking approaches like chain of thought, tree of thought and self-consistency for vulnerability detection use-cases. We provide an overview of the strengths and weaknesses of the LLM-based approach and compare the results to those of traditional static analysis tools. We find that LLMs can pinpoint more issues than traditional static analysis tools, outperforming traditional tools in terms of recall and F1 scores. However, LLMs are more prone to generate false positive classifications than traditional tools. The experiments are conducted using the Java programming language and the results should benefit software developers and security analysts responsible for ensuring that the code is free of vulnerabilities.
引用
收藏
页码:29698 / 29717
页数:20
相关论文
共 64 条
  • [21] Automatic Static Vulnerability Detection for Machine Learning Libraries: Are We There Yet?
    Harzevili, Nima Shiri
    Shin, Jiho
    Wang, Junjie
    Wang, Song
    Nagappan, Nachiappan
    [J]. 2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 795 - 806
  • [22] HuggingFace, LMSys Chatbot Arena Leaderboard
  • [23] Kaur Arvinder, 2020, Procedia Computer Science, V171, P2023, DOI 10.1016/j.procs.2020.04.217
  • [24] Khare A, 2024, Arxiv, DOI arXiv:2311.16169
  • [25] Kim G, 2023, Arxiv, DOI [arXiv:2303.17491, 10.48550/arXiv.2303.17491, DOI 10.48550/ARXIV.2303.17491]
  • [26] Comparison and Evaluation on Static Application Security Testing (SAST) Tools for Java']Java
    Li, Kaixuan
    Chen, Sen
    Fan, Lingling
    Feng, Ruitao
    Liu, Han
    Liu, Chengwei
    Liu, Yang
    Chen, Yixiang
    [J]. PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 921 - 933
  • [27] Software Vulnerability Detection Using Deep Neural Networks: A Survey
    Lin, Guanjun
    Wen, Sheng
    Han, Qing-Long
    Zhang, Jun
    Xiang, Yang
    [J]. PROCEEDINGS OF THE IEEE, 2020, 108 (10) : 1825 - 1848
  • [28] An Empirical Study on the Effectiveness of Static C Code Analyzers for Vulnerability Detection
    Lipp, Stephan
    Banescu, Sebastian
    Pretschner, Alexander
    [J]. PROCEEDINGS OF THE 31ST ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2022, 2022, : 544 - 555
  • [29] Logan RLL, 2021, Arxiv, DOI arXiv:2106.13353
  • [30] Madaan A, 2023, Arxiv, DOI arXiv:2303.17651