How Effective Are They? Exploring Large Language Model Based Fuzz Driver Generation

被引:0
|
作者
Zhang, Cen [1 ]
Zheng, Yaowen [1 ]
Bai, Mingqiang [2 ,3 ]
Li, Yeting [2 ,3 ]
Ma, Wei [1 ]
Xie, Xiaofei [4 ]
Li, Yuekang [5 ]
Sun, Limin [2 ,3 ]
Liu, Yang [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Chinese Acad Sci, IIE, Beijing, Peoples R China
[3] UCAS, Sch Cyber Secur, Beijing, Peoples R China
[4] Singapore Management Univ, Singapore, Singapore
[5] Univ New South Wales, Sydney, NSW, Australia
来源
PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024 | 2024年
基金
新加坡国家研究基金会;
关键词
Fuzz Driver Generation; Fuzz Testing; Large Language Model;
D O I
10.1145/3650212.3680355
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fuzz drivers are essential for library API fuzzing. However, automatically generating fuzz drivers is a complex task, as it demands the creation of high-quality, correct, and robust API usage code. An LLM-based (Large Language Model) approach for generating fuzz drivers is a promising area of research. Unlike traditional program analysis-based generators, this text-based approach is more generalized and capable of harnessing a variety of API usage information, resulting in code that is friendly for human readers. However, there is still a lack of understanding regarding the fundamental issues on this direction, such as its effectiveness and potential challenges. To bridge this gap, we conducted the first in-depth study targeting the important issues of using LLMs to generate effective fuzz drivers. Our study features a curated dataset with 86 fuzz driver generation questions from 30 widely-used C projects. Six prompting strategies are designed and tested across five state-of-the-art LLMs with five different temperature settings. In total, our study evaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens). Additionally, we compared the LLM-generated drivers against those utilized in industry, conducting extensive fuzzing experiments (3.75 CPU-year). Our study uncovered that: 1) While LLM-based fuzz driver generation is a promising direction, it still encounters several obstacles towards practical applications; 2) LLMs face difficulties in generating effective fuzz drivers for APIs with intricate specifics. Three featured design choices of prompt strategies can be beneficial: issuing repeat queries, querying with examples, and employing an iterative querying process; 3) While LLM-generated drivers can yield fuzzing outcomes that are on par with those used in the industry, there are substantial opportunities for enhancement, such as extending contained API usage, or integrating semantic oracles to facilitate logical bug detection. Our insights have been implemented to improve the OSS-Fuzz-Gen project, facilitating practical fuzz driver generation in industry.
引用
收藏
页码:1223 / 1235
页数:13
相关论文
共 50 条
  • [1] Exploring large language model for next generation of artificial intelligence in ophthalmology
    Jin, Kai
    Yuan, Lu
    Wu, Hongkang
    Grzybowski, Andrzej
    Ye, Juan
    FRONTIERS IN MEDICINE, 2023, 10
  • [2] Chinese Generation and Security Index Evaluation Based on Large Language Model
    Zhang, Yu
    Gao, Yongbing
    Li, Weihao
    Su, Zirong
    Yang, Lidong
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 151 - 161
  • [3] Large language model for patent concept generation
    Ren, Runtao
    Ma, Jian
    Luo, Jianxi
    ADVANCED ENGINEERING INFORMATICS, 2025, 65
  • [4] Exploring Vision Language Pretraining with Knowledge Enhancement via Large Language Model
    Tung, Chuenyuet
    Lin, Yi
    Yin, Jianing
    Ye, Qiaoyuchen
    Chen, Hao
    TRUSTWORTHY ARTIFICIAL INTELLIGENCE FOR HEALTHCARE, TAI4H 2024, 2024, 14812 : 81 - 91
  • [5] Large language model-based code generation for the control of construction assembly robots: A hierarchical generation approach
    Luo, Hanbin
    Wu, Jianxin
    Liu, Jiajing
    Antwi-Afari, Maxwell Fordjour
    DEVELOPMENTS IN THE BUILT ENVIRONMENT, 2024, 19
  • [6] WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment
    Wang, Shen
    Dong, Jialiang
    Wu, Longfei
    Guan, Zhitao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4755 - 4767
  • [7] Generation and Validation of Teaching Examples Based on Large Language Models
    He, Qing
    Wang, Yu
    Rao, Gaoqi
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 389 - 395
  • [8] Protecting Intellectual Property of Large Language Model-Based Code Generation APIs via Watermarks
    Li, Zongjie
    Wang, Chaozheng
    Wang, Shuai
    Gao, Cuiyun
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 2336 - 2350
  • [9] Injury degree appraisal of large language model based on retrieval-augmented generation and deep learning
    Zhang, Fan
    Luo, Yifang
    Gao, Zihuan
    Han, Aihua
    INTERNATIONAL JOURNAL OF LAW AND PSYCHIATRY, 2025, 100
  • [10] How well do large language model-based chatbots perform in oral and maxillofacial radiology?
    Jeong, Hui
    Han, Sang-Sun
    Yu, Youngjae
    Kim, Saejin
    Jeon, Kug Jin
    DENTOMAXILLOFACIAL RADIOLOGY, 2024, 53 (06) : 390 - 395