Evaluation of Large Language Models on Code Obfuscation (Student Abstract)

被引：0

作者：

Swindle, Adrian ^{[1
]}

McNealy, Derrick ^{[2
]}

Krishnan, Giri ^{[3
]}

Ramyaa, Ramyaa ^{[4
]}

机构：

[1] St Louis Univ, St Louis, MO 63103 USA

[2] Univ Southern Mississippi, Hattiesburg, MS USA

[3] Univ Calif San Diego, San Diego, CA USA

[4] New Mexico Inst Min & Technol, Socorro, NM USA

来源：

THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Obfuscation intends to decrease interpretability of code and identification of code behavior. Large Language Models (LLMs) have been proposed for code synthesis and code analysis. This paper attempts to understand how well LLMs can analyse code and identify code behavior. Specifically, this paper systematically evaluates several LLMs' capabilities to detect obfuscated code and identify behavior across a variety of obfuscation techniques with varying levels of complexity. LLMs proved to be better at detecting obfuscations that changed identifiers, even to misleading ones, compared to obfuscations involving code insertions (unused variables, as well as variables that replace constants with expressions that evaluate to those constants). Hardest to detect were obfuscations that layered multiple simple transformations. For these, only 20-40% of the LLMs' responses were correct. Adding misleading documentation was also successful in misleading LLMs. We provide all our code to replicate results at https://github.com/SwindleA/LLMCodeObfuscation. Overall, our results suggest a gap in LLMs' ability to understand code.

引用

页码：23664 / 23666

页数：3

共 50 条

[31] Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education
Kumar, Nischal Ashok
Lan, Andrew S.
arXiv, 1600,
[32] Leveraging the Inductive Bias of Large Language Models for Abstract Textual Reasoning
Rytting, Christopher Michael
Wingate, David
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[33] Framework for evaluating code generation ability of large language models
Yeo, Sangyeop
Ma, Yu-Seung
Kim, Sang Cheol
Jun, Hyungkook
Kim, Taeho
ETRI JOURNAL, 2024, 46 (01) : 106 - 117
[34] Contrastive learning with large language models for medical code prediction
Wu, Yuzhou
Zhang, Jin
Chen, Xuechen
Yao, Xin
Chen, Zhigang
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 277
[35] Assessing the Code Clone Detection Capability of Large Language Models
Zhang, Zixian
Saber, Takfarinas
PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON CODE QUALITY, ICCQ 2024, 2024,
[36] Comparative Analysis of Large Language Models in Source Code Analysis
Erdoğan, Hüseyin
Turan, Nezihe Turhan
Onan, Aytuğ
Lecture Notes in Networks and Systems, 2024, 1088 LNNS : 185 - 192
[37] Large Language Models for Code: Security Hardening and Adversarial Testing
He, Jingxuan
Vechev, Martin
PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1865 - 1879
[38] BioCoder: a benchmark for bioinformatics code generation with large language models
Tang, Xiangru
Qian, Bill
Gao, Rick
Chen, Jiakang
Chen, Xinyun
Gerstein, Mark B.
BIOINFORMATICS, 2024, 40 : i266 - i276
[39] RMCBENCH: Benchmarking Large Language Models' Resistance to Malicious Code
Chen, Jiachi
Zhong, Qingyuan
Wang, Yanlin
Ning, Kaiwen
Liu, Yongkun
Xu, Zenan
Zhao, Zhe
Chen, Ting
Zheng, Zibin
PROCEEDINGS OF 2024 39TH ACM/IEEE INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2024, 2024, : 995 - 1006
[40] Knowledge-Aware Code Generation with Large Language Models
Huang, Tao
Sun, Zhihong
Jin, Zhi
Li, Ge
Lyu, Chen
PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 52 - 63

← 1 2 3 4 5 →