Evaluating Large Language Models in Code Generation: INFINITE Methodology for Defining the Inference Index

被引：0

作者：

Christakis, Nicholas ^{[1
]}

Drikakis, Dimitris ^{[1
]}

机构：

[1] Univ Nicosia, Inst Adv Modeling & Simulat, CY-2417 Nicosia, Cyprus

来源：

APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 07期

关键词：

LLM; forecasting; inference; time series; LSTM; artificial intelligence;

D O I：

10.3390/app15073784

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

This study introduces a new methodology for an Inference Index (InI) called the Inference Index In Testing Model Effectiveness methodology (INFINITE), aiming to evaluate the performance of Large Language Models (LLMs) in code generation tasks. The InI index provides a comprehensive assessment focusing on three key components: efficiency, consistency, and accuracy. This approach encapsulates time-based efficiency, response quality, and the stability of model outputs, offering a thorough understanding of LLM performance beyond traditional accuracy metrics. We apply this methodology to compare OpenAI's GPT-4o (GPT), OpenAI-o1 pro (OAI1), and OpenAI-o3 mini-high (OAI3) in generating Python code for two tasks: a data-cleaning and statistical computation task and a Long Short-Term Memory (LSTM) model generation task for forecasting meteorological variables such as temperature, relative humidity, and wind speed. Our findings demonstrate that GPT outperforms OAI1 and performs comparably to OAI3 regarding accuracy and workflow efficiency. The study reveals that LLM-assisted code generation can produce results similar to expert-designed models with effective prompting and refinement. GPT's performance advantage highlights the benefits of widespread use and user feedback. These findings contribute to advancing AI-assisted software development, providing a structured approach for evaluating LLMs in coding tasks and setting the groundwork for future studies on broader model comparisons and expanded assessment frameworks.

引用

页数：24

共 71 条

[21]

Gulati A, 2020, Arxiv, DOI arXiv:2005.08100

[22]

He Y., 2024, arXiv

[23]

Hu HC, 2024, Arxiv, DOI [arXiv:2409.10033, DOI 10.48550/ARXIV.2409.10033]

[24]

Huang D, 2025, Arxiv, DOI arXiv:2309.14345

[25] Invited: New Solutions on LLM Acceleration, Optimization, and Application [J].

Huang, Yingbing ;

Wan, Lily Jiaxin ;

Ye, Hanchen ;

Jha, Mani ;

Wang, Jinghua ;

Li, Yuhong ;

Zhang, Xiaofan ;

Chen, Deming .

PROCEEDINGS OF THE 61ST ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2024, 2024,

[26] Gemini-the most powerful LLM: Myth or Truth [J].

Islam, Raisa ;

Ahmed, Imtiaz .

2024 5TH INFORMATION COMMUNICATION TECHNOLOGIES CONFERENCE, ICTC 2024, 2024, :303-308

[27]

Ji ZL, 2023, Arxiv, DOI arXiv:2310.06680

[28]

Jin M, 2024, Arxiv, DOI arXiv:2310.01728

[29]

Jin MY, 2024, Arxiv, DOI [arXiv:2401.04925, DOI 10.48550/ARXIV.2401.049252401.04925]

[30]

Kang KT, 2024, Arxiv, DOI [arXiv:2411.07681, 10.48550/arXiv.2411.07681 2411.07681]

← 1 2 3 4 5 6 7 8 →