Utility of large language models for creating clinical assessment items

被引:4
|
作者
Lam, George [1 ]
Shammoon, Yusra [1 ]
Coulson, Anna [1 ]
Lalloo, Felicity [1 ]
Maini, Arti [1 ]
Amin, Anjali [1 ]
Brown, Celia [1 ]
Sam, Amir H. [1 ]
机构
[1] Imperial Coll London, Imperial Coll, Sch Med, London, England
关键词
Large language models; generative AI; medical education; assessment; validity;
D O I
10.1080/0142159X.2024.2382860
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
PurposeTo compare student performance, examiner perceptions and cost of GPT-assisted (generative pretrained transformer-assisted) clinical and professional skills assessment (CPSAs) items against items created using standard methods.MethodsWe conducted a prospective, controlled, double-blinded comparison of CPSA items developed using GPT-assistance with those created through standard methods. Two sets of six practical cases were developed for a formative assessment sat by final year medical students. One clinical case in each set was created with GPT-assistance. Students were assigned to one of the two sets.ResultsThe results of 239 participants were analysed in the study. There was no statistically significant difference in item difficulty, or discriminative ability between GPT-assisted and standard items. One hundred percent (n = 15) of respondents to an examiner feedback questionnaire felt GPT-assisted cases were appropriately difficult and realistic. GPT-assistance resulted in significant labour cost savings, with a mean reduction of 57% (880 GBP) in labour cost per case when compared to standard case drafting methods.ConclusionsGPT-assistance can create CPSA items of comparable quality with significantly less cost when compared to standard methods. Future studies could evaluate GPT's ability to create CPSA material in other areas of clinical practice, aiming to validate the generalisability of these findings.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Automated Scoring of Constructed Response Items in Math Assessment Using Large Language Models
    Morris, Wesley
    Holmes, Langdon
    Choi, Joon Suh
    Crossley, Scott
    INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION, 2024,
  • [2] Promises and Pitfalls: Using Large Language Models to Generate Visualization Items
    Cui, Yuan
    Ge, Lily W.
    Ding, Yiren
    Harrison, Lane
    Yang, Fumeng
    Kay, Matthew
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2025, 31 (01) : 1094 - 1104
  • [3] Assessing the research landscape and clinical utility of large language models: a scoping review
    Ye-Jean Park
    Abhinav Pillai
    Jiawen Deng
    Eddie Guo
    Mehul Gupta
    Mike Paget
    Christopher Naugler
    BMC Medical Informatics and Decision Making, 24
  • [4] Assessing the research landscape and clinical utility of large language models: a scoping review
    Park, Ye-Jean
    Pillai, Abhinav
    Deng, Jiawen
    Guo, Eddie
    Gupta, Mehul
    Paget, Mike
    Naugler, Christopher
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [5] Are Large Language Models Good at Utility Judgments?
    Zhang, Hengran
    Zhang, Ruqing
    Guo, Jiafeng
    de Rijke, Maarten
    Fan, Yixing
    Cheng, Xueqi
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 1941 - 1951
  • [6] Large language models for sustainable assessment and feedback in higher education
    Agostini, Daniele
    Picasso, Federica
    INTELLIGENZA ARTIFICIALE, 2024, 18 (01) : 121 - 138
  • [7] Locating requirements in backlog items: Content analysis and experiments with large language models
    van Can, Ashley T.
    Dalpiaz, Fabiano
    INFORMATION AND SOFTWARE TECHNOLOGY, 2025, 179
  • [8] Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI
    Hao, Jiangang
    von Davier, Alina A.
    Yaneva, Victoria
    Lottridge, Susan
    von Davier, Matthias
    Harris, Deborah J.
    EDUCATIONAL MEASUREMENT-ISSUES AND PRACTICE, 2024, 43 (02) : 16 - 29
  • [9] Enhancing Large Language Models for Clinical Decision Support by Incorporating Clinical Practice Guidelines
    Oniani, David
    Wu, Xizhi
    Visweswaran, Shyam
    Kapoor, Sumit
    Kooragayalu, Shravan
    Polanska, Katelyn
    Wang, Yanshan
    2024 IEEE 12TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS, ICHI 2024, 2024, : 694 - 702
  • [10] Assessment of large language models in medical quizzes for clinical chemistry and laboratory management: implications and applications for healthcare artificial intelligence
    Heo, Won Young
    Park, Hyung-Doo
    SCANDINAVIAN JOURNAL OF CLINICAL & LABORATORY INVESTIGATION, 2025, : 125 - 132