Evaluate Chat-GPT's programming capability in Swift through real university exam questions

被引：3

作者：

Zhang, Zizhuo ^{[1
]}

Wen, Lian ^{[2
]}

Jiang, Yanfei ^{[3
]}

Liu, Yongli ^{[4
]}

机构：

[1] Changan Univ, Informat & Network Management Off, Xian, Peoples R China

[2] Griffith Univ, Sch ICT, Brisbane, Australia

[3] Xian Rail Transit Grp Co Ltd, Technol Ctr, Xian, Peoples R China

[4] Changan Univ, Sch Informat Engn, Xian, Peoples R China

来源：

SOFTWARE-PRACTICE & EXPERIENCE | 2024年 / 54卷 / 11期

关键词：

chat-GPT; evaluation; exam; GPT; programming; swift;

D O I：

10.1002/spe.3330

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

In this study, we evaluate the programming capabilities of OpenAI's GPT-3.5 and GPT-4 models using Swift-based exam questions from a third-year university course. The results indicate that both GPT models generally outperform the average student score, yet they do not consistently exceed the performance of the top students. This comparison highlights areas where the GPT models excel and where they fall short, providing a nuanced view of their current programming proficiency. The study also reveals surprising instances where GPT-3.5 outperforms GPT-4, suggesting complex variations in AI model capabilities. By providing a clear benchmark of GPT's programming skills in an academic context, our research contributes valuable insights for future advancements in AI programming education and underscores the need for continued development to fully realize AI's potential in educational settings.

引用

页码：2129 / 2143

页数：15

共 29 条

[1] Adams JP., 2008, INT C INN GOOD PRACT
[2] A Survey of Machine Learning for Big Code and Naturalness
Allamanis, Miltiadis
Barr, Earl T.
Devanbu, Premkumar
Sutton, Charles
[J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
[3] Guidelines for Human-AI Interaction
Amershi, Saleema
Weld, Dan
Vorvoreanu, Mihaela
Fourney, Adam
Nushi, Besmira
Collisson, Penny
Suh, Jina
Iqbal, Shamsi
Bennett, Paul N.
Inkpen, Kori
Teevan, Jaime
Kikin-Gil, Ruth
Horvitz, Eric
[J]. CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
[4] AMODEI D., 2016, ARXIV160606565
[5] Apple, 2023, SWIFT PROGR LANG
[6] Besold T., 2015, KUNSTL INTELL
[7] Brown TB, 2020, ADV NEUR IN, V33
[8] Cheshkov A., 2023, TECHNICAL REPORT EVA
[9] CodeWars, 2023, WHAT IS KAT
[10] Griffith, 2023, GRIFF U

← 1 2 3 →