Evaluating GPT's Programming Capability Through CodeWars' Katas

被引:0
作者
Zhang, Zizhuo [1 ]
Wen, Lian [2 ]
Zhang, Shaoyang [1 ]
Chen, David [2 ]
Jiang, Yanfei [3 ]
机构
[1] Changan Univ, Xian, Peoples R China
[2] Griffith Univ, Brisbane, Qld, Australia
[3] Xian Rail Transit Grp Co Ltd, Xian, Peoples R China
来源
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT V, KSEM 2024 | 2024年 / 14888卷
关键词
AI; ChatGPT; GPT; Programming; Coding; Evaluation; Complexity;
D O I
10.1007/978-981-97-5489-2_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding the capabilities and limitations of programming-oriented AI models is crucial. This paper evaluates the programming proficiency of GPT-3.5 and GPT-4 using Codewars coding problems of varying difficulty. The experiments reveal a distinct boundary at the 3kyu level, beyond which these models struggle. This led to proposing a complexity measure that includes problem difficulty and solution time. The research emphasizes the need for validation and creative thinking in AI models to better emulate human problem-solving. Future work aims to refine the complexity measure, enhance AI capabilities, and develop an objective programming problem difficulty measure. These insights are valuable for advancing AI programming and problem-solving abilities.
引用
收藏
页码:17 / 26
页数:10
相关论文
共 34 条
[1]  
Adams J. P., 2008, Innovation, Good Practice and Research in Engineering Education
[2]   A Survey of Machine Learning for Big Code and Naturalness [J].
Allamanis, Miltiadis ;
Barr, Earl T. ;
Devanbu, Premkumar ;
Sutton, Charles .
ACM COMPUTING SURVEYS, 2018, 51 (04)
[3]   Guidelines for Human-AI Interaction [J].
Amershi, Saleema ;
Weld, Dan ;
Vorvoreanu, Mihaela ;
Fourney, Adam ;
Nushi, Besmira ;
Collisson, Penny ;
Suh, Jina ;
Iqbal, Shamsi ;
Bennett, Paul N. ;
Inkpen, Kori ;
Teevan, Jaime ;
Kikin-Gil, Ruth ;
Horvitz, Eric .
CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
[4]  
Amodei D, 2016, Arxiv, DOI [arXiv:1606.06565, 10.48550/arXiv.1606.06565]
[5]  
[Anonymous], AlphaGo
[6]   Can Machine Intelligence be Measured in the Same Way as Human intelligence? [J].
Besold, Tarek ;
Hernandez-Orallo, Jose ;
Schmid, Ute .
KUNSTLICHE INTELLIGENZ, 2015, 29 (03) :291-297
[7]  
Brown TB, 2020, ADV NEUR IN, V33
[8]   A Survey of Monte Carlo Tree Search Methods [J].
Browne, Cameron B. ;
Powley, Edward ;
Whitehouse, Daniel ;
Lucas, Simon M. ;
Cowling, Peter I. ;
Rohlfshagen, Philipp ;
Tavener, Stephen ;
Perez, Diego ;
Samothrakis, Spyridon ;
Colton, Simon .
IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2012, 4 (01) :1-43
[9]  
bustle, Bustle: 6 best resources to learn how to code
[10]  
Cheshkov A, 2023, Arxiv, DOI arXiv:2304.07232