Evaluation of Large Language Models on Code Obfuscation (Student Abstract)

被引:0
作者
Swindle, Adrian [1 ]
McNealy, Derrick [2 ]
Krishnan, Giri [3 ]
Ramyaa, Ramyaa [4 ]
机构
[1] St Louis Univ, St Louis, MO 63103 USA
[2] Univ Southern Mississippi, Hattiesburg, MS USA
[3] Univ Calif San Diego, San Diego, CA USA
[4] New Mexico Inst Min & Technol, Socorro, NM USA
来源
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21 | 2024年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Obfuscation intends to decrease interpretability of code and identification of code behavior. Large Language Models (LLMs) have been proposed for code synthesis and code analysis. This paper attempts to understand how well LLMs can analyse code and identify code behavior. Specifically, this paper systematically evaluates several LLMs' capabilities to detect obfuscated code and identify behavior across a variety of obfuscation techniques with varying levels of complexity. LLMs proved to be better at detecting obfuscations that changed identifiers, even to misleading ones, compared to obfuscations involving code insertions (unused variables, as well as variables that replace constants with expressions that evaluate to those constants). Hardest to detect were obfuscations that layered multiple simple transformations. For these, only 20-40% of the LLMs' responses were correct. Adding misleading documentation was also successful in misleading LLMs. We provide all our code to replicate results at https://github.com/SwindleA/LLMCodeObfuscation. Overall, our results suggest a gap in LLMs' ability to understand code.
引用
收藏
页码:23664 / 23666
页数:3
相关论文
共 50 条
  • [31] Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education
    Kumar, Nischal Ashok
    Lan, Andrew S.
    arXiv, 1600,
  • [32] Leveraging the Inductive Bias of Large Language Models for Abstract Textual Reasoning
    Rytting, Christopher Michael
    Wingate, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [33] Framework for evaluating code generation ability of large language models
    Yeo, Sangyeop
    Ma, Yu-Seung
    Kim, Sang Cheol
    Jun, Hyungkook
    Kim, Taeho
    ETRI JOURNAL, 2024, 46 (01) : 106 - 117
  • [34] Contrastive learning with large language models for medical code prediction
    Wu, Yuzhou
    Zhang, Jin
    Chen, Xuechen
    Yao, Xin
    Chen, Zhigang
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 277
  • [35] Assessing the Code Clone Detection Capability of Large Language Models
    Zhang, Zixian
    Saber, Takfarinas
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON CODE QUALITY, ICCQ 2024, 2024,
  • [36] Comparative Analysis of Large Language Models in Source Code Analysis
    Erdoğan, Hüseyin
    Turan, Nezihe Turhan
    Onan, Aytuğ
    Lecture Notes in Networks and Systems, 2024, 1088 LNNS : 185 - 192
  • [37] Large Language Models for Code: Security Hardening and Adversarial Testing
    He, Jingxuan
    Vechev, Martin
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1865 - 1879
  • [38] BioCoder: a benchmark for bioinformatics code generation with large language models
    Tang, Xiangru
    Qian, Bill
    Gao, Rick
    Chen, Jiakang
    Chen, Xinyun
    Gerstein, Mark B.
    BIOINFORMATICS, 2024, 40 : i266 - i276
  • [39] RMCBENCH: Benchmarking Large Language Models' Resistance to Malicious Code
    Chen, Jiachi
    Zhong, Qingyuan
    Wang, Yanlin
    Ning, Kaiwen
    Liu, Yongkun
    Xu, Zenan
    Zhao, Zhe
    Chen, Ting
    Zheng, Zibin
    PROCEEDINGS OF 2024 39TH ACM/IEEE INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2024, 2024, : 995 - 1006
  • [40] Knowledge-Aware Code Generation with Large Language Models
    Huang, Tao
    Sun, Zhihong
    Jin, Zhi
    Li, Ge
    Lyu, Chen
    PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 52 - 63