Utility of large language models for creating clinical assessment items

被引:6
作者
Lam, George [1 ]
Shammoon, Yusra [1 ]
Coulson, Anna [1 ]
Lalloo, Felicity [1 ]
Maini, Arti [1 ]
Amin, Anjali [1 ]
Brown, Celia [1 ]
Sam, Amir H. [1 ]
机构
[1] Imperial Coll London, Imperial Coll, Sch Med, London, England
关键词
Large language models; generative AI; medical education; assessment; validity;
D O I
10.1080/0142159X.2024.2382860
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
PurposeTo compare student performance, examiner perceptions and cost of GPT-assisted (generative pretrained transformer-assisted) clinical and professional skills assessment (CPSAs) items against items created using standard methods.MethodsWe conducted a prospective, controlled, double-blinded comparison of CPSA items developed using GPT-assistance with those created through standard methods. Two sets of six practical cases were developed for a formative assessment sat by final year medical students. One clinical case in each set was created with GPT-assistance. Students were assigned to one of the two sets.ResultsThe results of 239 participants were analysed in the study. There was no statistically significant difference in item difficulty, or discriminative ability between GPT-assisted and standard items. One hundred percent (n = 15) of respondents to an examiner feedback questionnaire felt GPT-assisted cases were appropriately difficult and realistic. GPT-assistance resulted in significant labour cost savings, with a mean reduction of 57% (880 GBP) in labour cost per case when compared to standard case drafting methods.ConclusionsGPT-assistance can create CPSA items of comparable quality with significantly less cost when compared to standard methods. Future studies could evaluate GPT's ability to create CPSA material in other areas of clinical practice, aiming to validate the generalisability of these findings.
引用
收藏
页码:878 / 882
页数:5
相关论文
共 11 条
[1]   A primer on classical test theory and item response theory for assessments in medical education [J].
De Champlain, Andre F. .
MEDICAL EDUCATION, 2010, 44 (01) :109-117
[2]   GPTs are GPTs: Labor market impact potential of LLMs [J].
Eloundou, Tyna ;
Manning, Sam ;
Mishkin, Pamela ;
Rock, Daniel .
SCIENCE, 2024, 384 (6702) :1306-1308
[3]  
General Medical Council, 2021, ASSURING READINESS P
[4]   Preventing harm from non-conscious bias in medical generative AI [J].
Hastings, Janna .
LANCET DIGITAL HEALTH, 2024, 6 (01) :e2-e3
[5]  
Jones K., 2022, Unit Costs of Health and Social Care 2022 Manual
[6]   The Objective Structured Clinical Examination (OSCE): AMEE Guide No. 81. Part I: An historical and theoretical perspective [J].
Khan, Kamran Z. ;
Ramachandran, Sankaranarayanan ;
Gaunt, Kathryn ;
Pushkar, Piyush .
MEDICAL TEACHER, 2013, 35 (09) :E1437-E1446
[7]   ChatGPT- Reshaping medical education and clinical management [J].
Khan, Rehan Ahmed ;
Jawaid, Masood ;
Khan, Aymen Rehan ;
Sajjad, Madiha .
PAKISTAN JOURNAL OF MEDICAL SCIENCES, 2023, 39 (02) :605-607
[8]   Large Language Models in Medical Education: Comparing ChatGPT- to Human-Generated Exam Questions [J].
Laupichler, Matthias Carl ;
Rother, Johanna Flora ;
Kadow, Ilona C. Grunwald ;
Ahmadi, Seifollah ;
Raupach, Tobias .
ACADEMIC MEDICINE, 2024, 99 (05) :508-512
[9]   Large language models in medicine [J].
Thirunavukarasu, Arun James ;
Ting, Darren Shu Jeng ;
Elangovan, Kabilan ;
Gutierrez, Laura ;
Tan, Ting Fang ;
Ting, Daniel Shu Wei .
NATURE MEDICINE, 2023, 29 (08) :1930-1940
[10]   Practical Applications of ChatGPT in Undergraduate Medical Education [J].
Tsang, Ricky .
JOURNAL OF MEDICAL EDUCATION AND CURRICULAR DEVELOPMENT, 2023, 10