Framework for evaluating code generation ability of large language models

被引：6

作者：

Yeo, Sangyeop ^{[1
]}

Ma, Yu-Seung ^{[1
,2
,3
]}

Kim, Sang Cheol ^{[2
]}

Jun, Hyungkook ^{[2
]}

Kim, Taeho ^{[2
]}

机构：

[1] Univ Sci & Technol, Div Artificial Intelligence, Daejeon, South Korea

[2] Elect & Telecommun Res Inst, Artificial Intelligence Comp Res Lab, Daejeon, South Korea

[3] Elect & Telecommun Res Inst, Daejeon, South Korea

来源：

ETRI JOURNAL | 2024年 / 46卷 / 01期

关键词：

code generation; evaluation metric; large language model; natural language processing; software engineering;

D O I：

10.4218/etrij.2023-0357

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass-ratio@n$$ pass\hbox{-} ratio@n $$, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass-ratio@n$$ pass\hbox{-} ratio@n $$ metric.

引用

页码：106 / 117

页数：12

共 20 条

[1]

Athiwaratkun B., 2023, Multi-Lingual Evaluation of Code Generation Models

[2]

Austin Jacob., 2021, Program synthesis with large language models

[3]

Chen M., 2021, EVALUATING LARGE LAN

[4] Investigating Code Generation Performance of ChatGPT with Crowdsourcing Social Data [J].

Feng, Yunhe ;

Vanam, Sreecharan ;

Cherukupally, Manasa ;

Zheng, Weijian ;

Qiu, Meikang ;

Chen, Haihua .

2023 IEEE 47TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC, 2023, :876-885

[5] SMARTMARK: Software Watermarking Scheme for Smart Contracts [J].

Kim, Taeyoung ;

Jang, Yunhee ;

Lee, Chanjong ;

Koo, Hyungjoon ;

Kim, Hyoungshick .

2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, :283-294

[6]

Kulal S, 2019, ADV NEUR IN, V32

[7]

Li X.-Y., 2023, Think outside the code: Brainstorming boosts large language models in code generation

[8] Competition-level code generation with AlphaCode [J].

Li, Yujia ;

Choi, David ;

Chung, Junyoung ;

Kushman, Nate ;

Schrittwieser, Julian ;

Leblond, Remi ;

Eccles, Tom ;

Keeling, James ;

Gimeno, Felix ;

Dal Lago, Agustin ;

Hubert, Thomas ;

Choy, Peter ;

d'Autume, Cyprien de Masson ;

Babuschkin, Igor ;

Chen, Xinyun ;

Huang, Po-Sen ;

Welbl, Johannes ;

Gowal, Sven ;

Cherepanov, Alexey ;

Molloy, James ;

Mankowitz, Daniel J. ;

Robson, Esme Sutherland ;

Kohli, Pushmeet ;

de Freitas, Nando ;

Kavukcuoglu, Koray ;

Vinyals, Oriol .

SCIENCE, 2022, 378 (6624) :1092-1097

[9] Use of the normalcy index for the assessment of abnormal gait in the anterior cruciate ligament deficiency combined with meniscus injury [J].

Liu, Xiaode ;

Huang, Hongshi ;

Ren, Shuang ;

Rong, Qiguo ;

Ao, Yingfang .

COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING, 2020, 23 (14) :1102-1108

[10] An Empirical Evaluation of GitHub Copilot's Code Suggestions [J].

Nhan Nguyen ;

Nadi, Sarah .

2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), 2022, :1-5

← 1 2 →