Evaluating GPT's Programming Capability Through CodeWars' Katas

被引：0

作者：

Zhang, Zizhuo ^{[1
]}

Wen, Lian ^{[2
]}

Zhang, Shaoyang ^{[1
]}

Chen, David ^{[2
]}

Jiang, Yanfei ^{[3
]}

机构：

[1] Changan Univ, Xian, Peoples R China

[2] Griffith Univ, Brisbane, Qld, Australia

[3] Xian Rail Transit Grp Co Ltd, Xian, Peoples R China

来源：

KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT V, KSEM 2024 | 2024年 / 14888卷

关键词：

AI; ChatGPT; GPT; Programming; Coding; Evaluation; Complexity;

D O I：

10.1007/978-981-97-5489-2_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Understanding the capabilities and limitations of programming-oriented AI models is crucial. This paper evaluates the programming proficiency of GPT-3.5 and GPT-4 using Codewars coding problems of varying difficulty. The experiments reveal a distinct boundary at the 3kyu level, beyond which these models struggle. This led to proposing a complexity measure that includes problem difficulty and solution time. The research emphasizes the need for validation and creative thinking in AI models to better emulate human problem-solving. Future work aims to refine the complexity measure, enhance AI capabilities, and develop an objective programming problem difficulty measure. These insights are valuable for advancing AI programming and problem-solving abilities.

引用

页码：17 / 26

页数：10

共 34 条

[1]

Adams J. P., 2008, Innovation, Good Practice and Research in Engineering Education

[2] A Survey of Machine Learning for Big Code and Naturalness [J].

Allamanis, Miltiadis ;

Barr, Earl T. ;

Devanbu, Premkumar ;

Sutton, Charles .

ACM COMPUTING SURVEYS, 2018, 51 (04)

[3] Guidelines for Human-AI Interaction [J].

Amershi, Saleema ;

Weld, Dan ;

Vorvoreanu, Mihaela ;

Fourney, Adam ;

Nushi, Besmira ;

Collisson, Penny ;

Suh, Jina ;

Iqbal, Shamsi ;

Bennett, Paul N. ;

Inkpen, Kori ;

Teevan, Jaime ;

Kikin-Gil, Ruth ;

Horvitz, Eric .

CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,

[4]

Amodei D, 2016, Arxiv, DOI [arXiv:1606.06565, 10.48550/arXiv.1606.06565]

[5]

[Anonymous], AlphaGo

[6] Can Machine Intelligence be Measured in the Same Way as Human intelligence? [J].

Besold, Tarek ;

Hernandez-Orallo, Jose ;

Schmid, Ute .

KUNSTLICHE INTELLIGENZ, 2015, 29 (03) :291-297

[7]

Brown TB, 2020, ADV NEUR IN, V33

[8] A Survey of Monte Carlo Tree Search Methods [J].

Browne, Cameron B. ;

Powley, Edward ;

Whitehouse, Daniel ;

Lucas, Simon M. ;

Cowling, Peter I. ;

Rohlfshagen, Philipp ;

Tavener, Stephen ;

Perez, Diego ;

Samothrakis, Spyridon ;

Colton, Simon .

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2012, 4 (01) :1-43

[9]

bustle, Bustle: 6 best resources to learn how to code

[10]

Cheshkov A, 2023, Arxiv, DOI arXiv:2304.07232

← 1 2 3 4 →