Evaluation of Large Language Models on Code Obfuscation (Student Abstract)

被引：0

作者：

Swindle, Adrian ^{[1
]}

McNealy, Derrick ^{[2
]}

Krishnan, Giri ^{[3
]}

Ramyaa, Ramyaa ^{[4
]}

机构：

[1] St Louis Univ, St Louis, MO 63103 USA

[2] Univ Southern Mississippi, Hattiesburg, MS USA

[3] Univ Calif San Diego, San Diego, CA USA

[4] New Mexico Inst Min & Technol, Socorro, NM USA

来源：

THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Obfuscation intends to decrease interpretability of code and identification of code behavior. Large Language Models (LLMs) have been proposed for code synthesis and code analysis. This paper attempts to understand how well LLMs can analyse code and identify code behavior. Specifically, this paper systematically evaluates several LLMs' capabilities to detect obfuscated code and identify behavior across a variety of obfuscation techniques with varying levels of complexity. LLMs proved to be better at detecting obfuscations that changed identifiers, even to misleading ones, compared to obfuscations involving code insertions (unused variables, as well as variables that replace constants with expressions that evaluate to those constants). Hardest to detect were obfuscations that layered multiple simple transformations. For these, only 20-40% of the LLMs' responses were correct. Adding misleading documentation was also successful in misleading LLMs. We provide all our code to replicate results at https://github.com/SwindleA/LLMCodeObfuscation. Overall, our results suggest a gap in LLMs' ability to understand code.

引用

页码：23664 / 23666

页数：3

共 50 条

[41] Investigating the Efficacy of Large Language Models for Code Clone Detection
Khajezade, Mohamad
Wu, Jie J. W.
Fard, Fatemeh Hendijani
Rodriguez-Perez, Gema
Shehata, Mohamed Sami
PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 161 - 165
[42] Evaluating the effectiveness of large language models in abstract screening: a comparative analysis
Li, Michael
Sun, Jianping
Tan, Xianming
SYSTEMATIC REVIEWS, 2024, 13 (01)
[43] Comparing Code Explanations Created by Students and Large Language Models
Leinonen, Juho
Denny, Paul
MacNeil, Stephen
Sarsa, Sami
Bernstein, Seth
Kim, Joanne
Tran, Andrew
Hellas, Arto
PROCEEDINGS OF THE 2023 CONFERENCE ON INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, ITICSE 2023, VOL 1, 2023, : 124 - 130
[44] Multilingual Large Language Models Are Not (Yet) Code-Switchers
Zhang, Ruochen
Cahyawijaya, Samuel
Cruz, Jan Christian Blaise
Winata, Genta Indra
Aji, Alham Fikri
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12567 - 12582
[45] Bugs in large language models generated code: an empirical study
Tambon, Florian
Moradi-Dakhel, Arghavan
Nikanjam, Amin
Khomh, Foutse
Desmarais, Michel C.
Antoniol, Giuliano
EMPIRICAL SOFTWARE ENGINEERING, 2025, 30 (03)
[46] Repairing Infrastructure-as-Code using Large Language Models
Low, En
Cheh, Carmen
Chen, Binbin
2024 IEEE SECURE DEVELOPMENT CONFERENCE, SECDEV 2024, 2024, : 20 - 27
[47] Code Clone Detection Techniques Based on Large Language Models
Almatrafi, Afnan A.
Eassa, Fathy A.
Sharaf, Sanaa A.
IEEE ACCESS, 2025, 13 : 46136 - 46146
[48] Large language models for code completion: A systematic literature review
Husein, Rasha Ahmad
Aburajouh, Hala
Catal, Cagatay
COMPUTER STANDARDS & INTERFACES, 2025, 92
[49] Comparative Analysis of Large Language Models in Source Code Analysis
Erdogan, Huseyin
Turan, Nezihe Turhan
Onan, Aytug
INTELLIGENT AND FUZZY SYSTEMS, INFUS 2024 CONFERENCE, VOL 1, 2024, 1088 : 185 - 192
[50] Code Detection for Hardware Acceleration Using Large Language Models
Martinez, Pablo Antonio
Bernabe, Gregorio
Garcia, Jose Manuel
IEEE ACCESS, 2024, 12 : 35271 - 35281

← 1 2 3 4 5 →