Evaluating the Performance of Large Language Models in Competitive Programming: A Multi-Year, Multi-Grade Analysis

被引：0

作者：

Dumitran, Adrian Marius ^{[1
]}

Badea, Adrian Catalin ^{[1
]}

Muscalu, Stefan-Gabriel ^{[2
]}

机构：

[1] Univ Bucharest, Comp Sci Dept, Bucharest, Romania

[2] It Just Works Inc, Bucharest, Romania

来源：

2024 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS, INISTA | 2024年

关键词：

Large Language Models (LLMs); Benchmark; IOI; Code Generation; AI in Education; C plus; !text type='Python']Python[!/text;

D O I：

10.1109/INISTA62901.2024.10683837

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This study explores the performance of large language models (LLMs) in solving competitive programming problems from the Romanian Informatics Olympiad at the county level. Romania, a leading nation in computer science competitions, provides an ideal environment for evaluating LLM capabilities due to its rich history and stringent competition standards. We collected and analyzed a dataset comprising 304 challenges from 2002 to 2023, focusing on solutions written by LLMs in C++ and Python for these problems. Our primary goal is to understand why LLMs perform well or poorly on different tasks. We evaluated various models, including closed-source models like GPT-4 and open-weight models such as CodeLlama and RoMistral, using a standardized process involving multiple attempts and feedback rounds. The analysis revealed significant variations in LLM performance across different grades and problem types. Notably, GPT-4 showed strong performance, indicating its potential use as an educational tool for middle school students. We also observed differences in code quality and style across various LLMs.

引用

页数：7

共 19 条

[1] Abdin M, 2024, Arxiv, DOI arXiv:2404.14219
[2] Chen M., 2021, ARXIV
[3] Cosma Adrian, 2024, arXiv
[4] Guo DY, 2024, Arxiv, DOI arXiv:2401.14196
[5] Hendrycks D., 2021, NeurIPS
[6] Huang YM, 2023, Arxiv, DOI arXiv:2312.02143
[7] Jacob B, 2017, Arxiv, DOI arXiv:1712.05877
[8] Jiang AQ, 2023, Arxiv, DOI arXiv:2310.06825
[9] Kim J, 2024, Arxiv, DOI arXiv:2310.05791
[10] Lei B, 2024, Arxiv, DOI arXiv:2405.14906

← 1 2 →