Evaluating the Performance of Large Language Models in Competitive Programming: A Multi-Year, Multi-Grade Analysis

被引:0
作者
Dumitran, Adrian Marius [1 ]
Badea, Adrian Catalin [1 ]
Muscalu, Stefan-Gabriel [2 ]
机构
[1] Univ Bucharest, Comp Sci Dept, Bucharest, Romania
[2] It Just Works Inc, Bucharest, Romania
来源
2024 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS, INISTA | 2024年
关键词
Large Language Models (LLMs); Benchmark; IOI; Code Generation; AI in Education; C plus; !text type='Python']Python[!/text;
D O I
10.1109/INISTA62901.2024.10683837
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study explores the performance of large language models (LLMs) in solving competitive programming problems from the Romanian Informatics Olympiad at the county level. Romania, a leading nation in computer science competitions, provides an ideal environment for evaluating LLM capabilities due to its rich history and stringent competition standards. We collected and analyzed a dataset comprising 304 challenges from 2002 to 2023, focusing on solutions written by LLMs in C++ and Python for these problems. Our primary goal is to understand why LLMs perform well or poorly on different tasks. We evaluated various models, including closed-source models like GPT-4 and open-weight models such as CodeLlama and RoMistral, using a standardized process involving multiple attempts and feedback rounds. The analysis revealed significant variations in LLM performance across different grades and problem types. Notably, GPT-4 showed strong performance, indicating its potential use as an educational tool for middle school students. We also observed differences in code quality and style across various LLMs.
引用
收藏
页数:7
相关论文
共 19 条
  • [1] Abdin M, 2024, Arxiv, DOI arXiv:2404.14219
  • [2] Chen M., 2021, ARXIV
  • [3] Cosma Adrian, 2024, arXiv
  • [4] Guo DY, 2024, Arxiv, DOI arXiv:2401.14196
  • [5] Hendrycks D., 2021, NeurIPS
  • [6] Huang YM, 2023, Arxiv, DOI arXiv:2312.02143
  • [7] Jacob B, 2017, Arxiv, DOI arXiv:1712.05877
  • [8] Jiang AQ, 2023, Arxiv, DOI arXiv:2310.06825
  • [9] Kim J, 2024, Arxiv, DOI arXiv:2310.05791
  • [10] Lei B, 2024, Arxiv, DOI arXiv:2405.14906