Can ChatGPT Pass High School Exams on English Language Comprehension?

被引：43

作者：

de Winter, Joost C. F. ^{[1
]}

机构：

[1] Delft Univ Technol, Cognit Robot Dept, Delft, Netherlands

来源：

INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION | 2024年 / 34卷 / 03期

关键词：

GPT-3.5; GPT-4; Large language model; Educational assessment; Reading comprehension;

D O I：

10.1007/s40593-023-00372-z

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Launched in late November 2022, ChatGPT, a large language model chatbot, has garnered considerable attention. However, ongoing questions remain regarding its capabilities. In this study, ChatGPT was used to complete national high school exams in the Netherlands on the topic of English reading comprehension. In late December 2022, we submitted the exam questions through the ChatGPT web interface (GPT-3.5). According to official norms, ChatGPT achieved a mean grade of 7.3 on the Dutch scale of 1 to 10-comparable to the mean grade of all students who took the exam in the Netherlands, 6.99. However, ChatGPT occasionally required re-prompting to arrive at an explicit answer; without these nudges, the overall grade was 6.5. In March 2023, API access was made available, and a new version of ChatGPT, GPT-4, was released. We submitted the same exams to the API, and GPT-4 achieved a score of 8.3 without a need for re-prompting. Additionally, employing a bootstrapping method that incorporated randomness through ChatGPT's 'temperature' parameter proved effective in self-identifying potentially incorrect answers. Finally, a re-assessment conducted with the GPT-4 model updated as of June 2023 showed no substantial change in the overall score. The present findings highlight significant opportunities but also raise concerns about the impact of ChatGPT and similar large language models on educational assessment.

引用

页码：915 / 930

页数：16

共 50 条

[41]

Susnjak T, 2022, ARXIV

[42] Using ChatGPT for human-computer interaction research: a primer [J].

Tabone, Wilbert ;

de Winter, Joost .

ROYAL SOCIETY OPEN SCIENCE, 2023, 10 (09)

[43]

Vincent J., 2022, The Verge

[44]

Wang X., 2023, 11 INT C LEARN REPR, DOI DOI 10.48550/ARXIV.2203.11171

[45]

Wei J, 2022, Transact. Mach. Learn. Res., DOI DOI 10.48550/ARXIV.2206.07682

[46]

Whitford E., 2022, A Computer Can Now Write Your College Essay, Maybe Better Than You Can

[47] TVS: a trusted verification scheme for office documents based on blockchain [J].

Zhai, Xue ;

Pang, Shanchen ;

Wang, Min ;

Qiao, Sibo ;

Lv, Zhihan .

COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (03) :2865-2877

[48]

Zheng C., 2023, ARXIV, DOI DOI 10.48550/ARXIV.2304.09797

[49]

Zhong Q., 2023, ARXIV, DOI DOI 10.48550/ARXIV.2302.10198

[50]

Zhong W., 2023, ARXIV, DOI DOI 10.48550/ARXIV.2304.06364

← 1 2 3 4 5 →