MSFuzz: Augmenting Protocol Fuzzing with Message Syntax Comprehension via Large Language Models

被引:1
作者
Cheng, Mingjie [1 ,2 ]
Zhu, Kailong [1 ,2 ]
Chen, Yuanchao [1 ,2 ]
Yang, Guozheng [1 ,2 ]
Lu, Yuliang [1 ,2 ]
Lu, Canju [1 ,2 ]
机构
[1] Natl Univ Def Technol, Coll Elect Engn, Hefei 230037, Peoples R China
[2] Anhui Prov Key Lab Cyberspace Secur Situat Awarene, Hefei 230037, Peoples R China
关键词
fuzzing; syntax aware; protocol implementations; large language models; FUZZER;
D O I
10.3390/electronics13132632
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Network protocol implementations, as integral components of information communication, are critically important for security. Due to its efficiency and automation, fuzzing has become a popular method for protocol security detection. However, the existing protocol-fuzzing techniques face the critical problem of generating high-quality inputs. To address the problem, in this paper, we propose MSFuzz, which is a protocol-fuzzing method with message syntax comprehension. The core observation of MSFuzz is that the source code of protocol implementations contains detailed and comprehensive knowledge of the message syntax. Specifically, we leveraged the code-understanding capabilities of large language models to extract the message syntax from the source code and construct message syntax trees. Then, using these syntax trees, we expanded the initial seed corpus and designed a novel syntax-aware mutation strategy to guide the fuzzing. To evaluate the performance of MSFuzz, we compared it with the state-of-the-art (SOTA) protocol fuzzers, namely, AFLNET and CHATAFL. Experimental results showed that compared with AFLNET and CHATAFL, MSFuzz achieved average improvements of 22.53% and 10.04% in the number of states, 60.62% and 19.52% improvements in the number of state transitions, and 29.30% and 23.13% improvements in branch coverage. Additionally, MSFuzz discovered more vulnerabilities than the SOTA fuzzers.
引用
收藏
页数:19
相关论文
共 44 条
  • [1] Aitel D., 2002, P BLACK HAT US
  • [2] Banks G, 2006, LECT NOTES COMPUT SC, V4176, P343
  • [3] Beddoe M.A., 2004, Toorcon, V26, P1095
  • [4] Blumbergs B, 2017, IEEE MILIT COMMUN C, P707, DOI 10.1109/MILCOM.2017.8170785
  • [5] Caballero J, 2007, CCS'07: PROCEEDINGS OF THE 14TH ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, P317
  • [6] Prospex: Protocol Specification Extraction
    Comparetti, Paolo Milani
    Wondracek, Gilbert
    Kruegel, Christopher
    Kirda, Engin
    [J]. PROCEEDINGS OF THE 2009 30TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, 2009, : 110 - +
  • [7] Cui W., 2006, P NDSS SAN DIEG CA U
  • [8] Cui WD, 2007, USENIX ASSOCIATION PROCEEDINGS OF THE 16TH USENIX SECURITY SYMPOSIUM, P199
  • [9] Cui WD, 2008, CCS'08: PROCEEDINGS OF THE 15TH ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, P391
  • [10] Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models
    Deng, Yinlin
    Xia, Chunqiu Steven
    Peng, Haoran
    Yang, Chenyuan
    Zhan, Lingming
    [J]. PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 423 - 435