How Effective Are They? Exploring Large Language Model Based Fuzz Driver Generation

被引:0
|
作者
Zhang, Cen [1 ]
Zheng, Yaowen [1 ]
Bai, Mingqiang [2 ,3 ]
Li, Yeting [2 ,3 ]
Ma, Wei [1 ]
Xie, Xiaofei [4 ]
Li, Yuekang [5 ]
Sun, Limin [2 ,3 ]
Liu, Yang [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Chinese Acad Sci, IIE, Beijing, Peoples R China
[3] UCAS, Sch Cyber Secur, Beijing, Peoples R China
[4] Singapore Management Univ, Singapore, Singapore
[5] Univ New South Wales, Sydney, NSW, Australia
来源
PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024 | 2024年
基金
新加坡国家研究基金会;
关键词
Fuzz Driver Generation; Fuzz Testing; Large Language Model;
D O I
10.1145/3650212.3680355
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fuzz drivers are essential for library API fuzzing. However, automatically generating fuzz drivers is a complex task, as it demands the creation of high-quality, correct, and robust API usage code. An LLM-based (Large Language Model) approach for generating fuzz drivers is a promising area of research. Unlike traditional program analysis-based generators, this text-based approach is more generalized and capable of harnessing a variety of API usage information, resulting in code that is friendly for human readers. However, there is still a lack of understanding regarding the fundamental issues on this direction, such as its effectiveness and potential challenges. To bridge this gap, we conducted the first in-depth study targeting the important issues of using LLMs to generate effective fuzz drivers. Our study features a curated dataset with 86 fuzz driver generation questions from 30 widely-used C projects. Six prompting strategies are designed and tested across five state-of-the-art LLMs with five different temperature settings. In total, our study evaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens). Additionally, we compared the LLM-generated drivers against those utilized in industry, conducting extensive fuzzing experiments (3.75 CPU-year). Our study uncovered that: 1) While LLM-based fuzz driver generation is a promising direction, it still encounters several obstacles towards practical applications; 2) LLMs face difficulties in generating effective fuzz drivers for APIs with intricate specifics. Three featured design choices of prompt strategies can be beneficial: issuing repeat queries, querying with examples, and employing an iterative querying process; 3) While LLM-generated drivers can yield fuzzing outcomes that are on par with those used in the industry, there are substantial opportunities for enhancement, such as extending contained API usage, or integrating semantic oracles to facilitate logical bug detection. Our insights have been implemented to improve the OSS-Fuzz-Gen project, facilitating practical fuzz driver generation in industry.
引用
收藏
页码:1223 / 1235
页数:13
相关论文
共 50 条
  • [31] Chinese Text Open Domain Tag Generation Method via Large Language Model
    He, Chunhui
    Ge, Bin
    Zhang, Chong
    2024 10TH INTERNATIONAL CONFERENCE ON BIG DATA AND INFORMATION ANALYTICS, BIGDIA 2024, 2024, : 183 - 188
  • [32] Automatically Inspecting Thousands of Static BugWarnings with Large Language Model: How Far AreWe?
    Wen, Cheng
    Cai, Yuandao
    Zhang, Bin
    Su, Jie
    Xu, Zhiwu
    Liu, Dugang
    Qin, Shengchao
    Ming, Zhong
    Cong, Tian
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (07)
  • [33] A Large Language Model Agent Based Legal Assistant for Governance Applications
    Mamalis, Marios Evangelos
    Kalampokis, Evangelos
    Fitsilis, Fotios
    Theodorakopoulos, Georgios
    Tarabanis, Konstantinos
    ELECTRONIC GOVERNMENT, EGOV 2024, 2024, 14841 : 286 - 301
  • [34] UnrealMentor GPT: A System for Teaching Programming Based on a Large Language Model
    Zhu, Hongli
    Xiang, Jian
    Yang, Zhichuang
    COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, 2025, 33 (03)
  • [35] Unknown web attack threat detection based on large language model
    Xu, Yijia
    Zhang, Qiang
    Deng, Huaxin
    Liu, Zhonglin
    Yang, Cheng
    Fang, Yong
    APPLIED SOFT COMPUTING, 2025, 173
  • [36] Application of large language model combined with retrieval enhanced generation technology in digestive endoscopic nursing
    Fu, Zhaoli
    Fu, Siyuan
    Huang, Yuan
    He, Wenfang
    Zhong, Zhuodan
    Guo, Yan
    Lin, Yanfeng
    FRONTIERS IN MEDICINE, 2024, 11
  • [37] Retrieval-Augmented Generation Approach: Document Question Answering using Large Language Model
    Muludi, Kurnia
    Fitria, Kaira Milani
    Triloka, Joko
    Sutedi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (03) : 776 - 785
  • [38] Reconstruction and Generation of Porous Metamaterial Units Via Variational Graph Autoencoder and Large Language Model
    Khanghah, Kiarash Naghavi
    Wang, Zihan
    Xu, Hongyi
    JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2025, 25 (02)
  • [39] Safety Analysis of Large Model Content Generation Based on Knowledge Editing
    Wang M.
    Yao Y.
    Xi Z.
    Zhang J.
    Wang P.
    Xu Z.
    Zhang N.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (05): : 1143 - 1155
  • [40] Slit Lamp Report Generation and Question Answering: Development and Validation of a Multimodal Transformer Model with Large Language Model Integration
    Zhao, Ziwei
    Zhang, Weiyi
    Chen, Xiaolan
    Song, Fan
    Gunasegaram, James
    Huang, Wenyong
    Shi, Danli
    He, Mingguang
    Liu, Na
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26