Do Large Language Models Generate Similar Codes from Mutated Prompts? A Case Study of Gemini Pro

被引：0

作者：

Patel, Hetvi ^{[1
]}

Shah, Kevin Amit ^{[1
]}

Mondal, Shouvick ^{[1
]}

机构：

[1] IIT Gandhinagar, Gandhinagar, India

来源：

COMPANION PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, FSE COMPANION 2024 | 2024年

关键词：

NL Prompt Mutation; LLMs; Source Code Similarity; Gemini Pro;

D O I：

10.1145/3663529.3663873

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

In this work, we delve into the domain of source code similarity detection using Large Language Models (LLMs). Our investigation is motivated by the necessity to identify similarities among different pieces of source code, a critical aspect for tasks such as plagiarism detection and code reuse. We specifically focus on exploring the effectiveness of leveraging LLMs for this purpose. To achieve this, we utilized the LLMSecEval dataset, comprising 150 NL prompts for code generation across two languages: C and Python, and employed radamsa, a mutation-based input generator, to create 26 different mutations per NL prompt. Next, using the Gemini Pro LLM, we generated code for the original and mutated NL prompts. Finally, we detect code similarities using the recently proposed CodeBERTScore metric that utilizes the CodeBERT LLM. Our experiment aims to uncover the extent to which LLMs can consistently generate similar code despite mutations in the input NL prompts, providing insights into the robustness and generalizability of LLMs in understanding and comparing code syntax and semantics.

引用

页码：671 / 672

页数：2

共 6 条

[1] blog.google, 2023, About us
[2] Feng ZY, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P1536
[3] Github, 2024, ABOUT US
[4] Helin Aki., 2018, radamsa
[5] LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations
Tony, Catherine
Mutas, Markus
Ferreyra, Nicolas E. Diaz
Scandariato, Riccardo
[J]. 2023 IEEE/ACM 20TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2023, : 588 - 592
[6] Zhou S., 2023, P C EMP METH NAT LAN, P13921, DOI [DOI 10.18653/V1/2023.EMNLP-MAIN.859, 10.18653/v1/2023.emnlp-main.859]

← 1 →