How Effective Are They? Exploring Large Language Model Based Fuzz Driver Generation

被引:0
|
作者
Zhang, Cen [1 ]
Zheng, Yaowen [1 ]
Bai, Mingqiang [2 ,3 ]
Li, Yeting [2 ,3 ]
Ma, Wei [1 ]
Xie, Xiaofei [4 ]
Li, Yuekang [5 ]
Sun, Limin [2 ,3 ]
Liu, Yang [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Chinese Acad Sci, IIE, Beijing, Peoples R China
[3] UCAS, Sch Cyber Secur, Beijing, Peoples R China
[4] Singapore Management Univ, Singapore, Singapore
[5] Univ New South Wales, Sydney, NSW, Australia
来源
PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024 | 2024年
基金
新加坡国家研究基金会;
关键词
Fuzz Driver Generation; Fuzz Testing; Large Language Model;
D O I
10.1145/3650212.3680355
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fuzz drivers are essential for library API fuzzing. However, automatically generating fuzz drivers is a complex task, as it demands the creation of high-quality, correct, and robust API usage code. An LLM-based (Large Language Model) approach for generating fuzz drivers is a promising area of research. Unlike traditional program analysis-based generators, this text-based approach is more generalized and capable of harnessing a variety of API usage information, resulting in code that is friendly for human readers. However, there is still a lack of understanding regarding the fundamental issues on this direction, such as its effectiveness and potential challenges. To bridge this gap, we conducted the first in-depth study targeting the important issues of using LLMs to generate effective fuzz drivers. Our study features a curated dataset with 86 fuzz driver generation questions from 30 widely-used C projects. Six prompting strategies are designed and tested across five state-of-the-art LLMs with five different temperature settings. In total, our study evaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens). Additionally, we compared the LLM-generated drivers against those utilized in industry, conducting extensive fuzzing experiments (3.75 CPU-year). Our study uncovered that: 1) While LLM-based fuzz driver generation is a promising direction, it still encounters several obstacles towards practical applications; 2) LLMs face difficulties in generating effective fuzz drivers for APIs with intricate specifics. Three featured design choices of prompt strategies can be beneficial: issuing repeat queries, querying with examples, and employing an iterative querying process; 3) While LLM-generated drivers can yield fuzzing outcomes that are on par with those used in the industry, there are substantial opportunities for enhancement, such as extending contained API usage, or integrating semantic oracles to facilitate logical bug detection. Our insights have been implemented to improve the OSS-Fuzz-Gen project, facilitating practical fuzz driver generation in industry.
引用
收藏
页码:1223 / 1235
页数:13
相关论文
共 50 条
  • [21] Sequential Recommendation with Latent Relations based on Large Language Model
    Yang, Shenghao
    Ma, Weizhi
    Sun, Peijie
    Ai, Qingyao
    Liu, Yiqun
    Cai, Mingchen
    Zhang, Min
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 335 - 344
  • [22] Mystery Game Script Compose Based on a Large Language Model
    Li, Jiameng
    Chen, Zhen
    Lin, Weiran
    Zou, Liangjun
    Xie, Xin
    Hu, Yaodong
    Li, Dianmo
    2024 IEEE 5TH ANNUAL WORLD AI IOT CONGRESS, AIIOT 2024, 2024, : 0451 - 0455
  • [23] Large Language Model Inference Acceleration Based on Hybrid Model Branch Prediction
    Duan, Gaoxiang
    Chen, Jiajie
    Zhou, Yueying
    Zheng, Xiaoying
    Zhu, Yongxin
    ELECTRONICS, 2024, 13 (07)
  • [24] Large Language Model-Based Wireless Network Design
    Qiu, Kehai
    Bakirtzis, Stefanos
    Wassell, Ian
    Song, Hui
    Zhang, Jie
    Wang, Kezhi
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2024, 13 (12) : 3340 - 3344
  • [25] Code-level quantum circuit generation based on large language models
    He, Zhimin
    Li, Guohong
    Situ, Haozhen
    Zhou, Yan
    Zheng, Shenggen
    Li, Lvzhou
    SCIENTIA SINICA-PHYSICA MECHANICA & ASTRONOMICA, 2025, 55 (04)
  • [26] Continually Tuning a Large Language Model for Multi-domain Radiology Report Generation
    Sun, Yihua
    Khor, Hee Guan
    Wang, Yuanzheng
    Wang, Zhuhao
    Zhao, Hongliang
    Zhang, Yu
    Ma, Longfei
    Zheng, Zhuozhao
    Liao, Hongen
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V, 2024, 15005 : 177 - 187
  • [27] Multi-Intent Inline Code Comment Generation via Large Language Model
    Zhang, Xiaowei
    Chen, Zhifei
    Cao, Yulu
    Chen, Lin
    Zhou, Yuming
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2024, 34 (06) : 845 - 868
  • [28] Fine-Tuning a Large Language Model with Reinforcement Learning for Educational Question Generation
    Lamsiyah, Salima
    El Mahdaouy, Abdelkader
    Nourbakhsh, Aria
    Schommer, Christoph
    ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I, AIED 2024, 2024, 14829 : 424 - 438
  • [29] ChatPCG: Large Language Model-Driven Reward Design for Procedural Content Generation
    Baek, In-Chang
    Park, Tae-Hwa
    Noh, Jin-Ha
    Bae, Cheong-Mok
    Kim, Kyung-Joong
    2024 IEEE CONFERENCE ON GAMES, COG 2024, 2024,
  • [30] Automatic item generation in various STEM subjects using large language model prompting
    Park, Joonhyeong (joonhyeong.park@nie.edu.sg), 2025, 8