Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment

被引:38
作者
Lai, U. Hin [1 ,2 ]
Wu, Keng Sam [1 ,3 ]
Hsu, Ting-Yu [2 ,3 ]
Kan, Jessie Kai Ching [2 ,4 ]
机构
[1] Sandwell & West Birmingham NHS Trust, West Bromwich, England
[2] Aston Med Sch, Birmingham, England
[3] Univ Hosp Birmingham NHS Trust, Birmingham, England
[4] Worcestershire Acute Hosp NHS Trust, Worcester, England
关键词
examination; ChatGPT; assessment; United Kingdom Medical Licensing Assessment; medical education; medicine; Medical Licensing Examination; ARTIFICIAL-INTELLIGENCE;
D O I
10.3389/fmed.2023.1240915
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
IntroductionRecent developments in artificial intelligence large language models (LLMs), such as ChatGPT, have allowed for the understanding and generation of human-like text. Studies have found LLMs abilities to perform well in various examinations including law, business and medicine. This study aims to evaluate the performance of ChatGPT in the United Kingdom Medical Licensing Assessment (UKMLA).MethodsTwo publicly available UKMLA papers consisting of 200 single-best-answer (SBA) questions were screened. Nine SBAs were omitted as they contained images that were not suitable for input. Each question was assigned a specialty based on the UKMLA content map published by the General Medical Council. A total of 191 SBAs were inputted in ChatGPT-4 through three attempts over the course of 3 weeks (once per week).ResultsChatGPT scored 74.9% (143/191), 78.0% (149/191) and 75.6% (145/191) on three attempts, respectively. The average of all three attempts was 76.3% (437/573) with a 95% confidence interval of (74.46% and 78.08%). ChatGPT answered 129 SBAs correctly and 32 SBAs incorrectly on all three attempts. On three attempts, ChatGPT performed well in mental health (8/9 SBAs), cancer (11/14 SBAs) and cardiovascular (10/13 SBAs). On three attempts, ChatGPT did not perform well in clinical haematology (3/7 SBAs), endocrine and metabolic (2/5 SBAs) and gastrointestinal including liver (3/10 SBAs). Regarding to response consistency, ChatGPT provided correct answers consistently in 67.5% (129/191) of SBAs but provided incorrect answers consistently in 12.6% (24/191) and inconsistent response in 19.9% (38/191) of SBAs, respectively.Discussion and conclusionThis study suggests ChatGPT performs well in the UKMLA. There may be a potential correlation between specialty performance. LLMs ability to correctly answer SBAs suggests that it could be utilised as a supplementary learning tool in medical education with appropriate medical educator supervision.
引用
收藏
页数:8
相关论文
共 46 条
[1]  
Abdelkarim A., 2018, EC Dent Sci, V17, P1
[2]  
Al-Shakarchi Nader James, 2023, Mayo Clin Proc Digit Health, V1, P309, DOI 10.1016/j.mcpdig.2023.06.004
[3]   Artificial intelligence and anaesthesia examinations: exploring ChatGPT as a prelude to the future [J].
Aldridge, Matthew J. ;
Penders, Robert .
BRITISH JOURNAL OF ANAESTHESIA, 2023, 131 (02) :e36-e37
[4]   Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations [J].
Bhayana, Rajesh ;
Krishna, Satheesh ;
Bleakney, Robert R. .
RADIOLOGY, 2023, 307 (05)
[5]   Artificial Intelligence, Machine Learning, Deep Learning, and Cognitive Computing: What Do These Terms Mean and How Will They Impact Health Care? [J].
Bini, Stefano A. .
JOURNAL OF ARTHROPLASTY, 2018, 33 (08) :2358-2361
[6]  
Blagojevic Andela, 2023, Applied Artificial Intelligence: Medicine, Biology, Chemistry, Financial, Games, Engineering. Lecture Notes in Networks and Systems (659), P271, DOI 10.1007/978-3-031-29717-5_17
[7]   ChatGPT: five priorities for research [J].
Bockting, Claudi ;
van Dis, Eva A. M. ;
Bollen, Johan ;
van Rooij, Robert ;
Zuidema, Willem L. .
NATURE, 2023, 614 (7947) :224-226
[8]   Artificial intelligence in information systems research: A systematic literature review and research agenda [J].
Collins, Christopher ;
Dennehy, Denis ;
Conboy, Kieran ;
Mikalef, Patrick .
INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2021, 60
[9]   The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers [J].
Eysenbach, Gunther .
JMIR MEDICAL EDUCATION, 2023, 9
[10]  
Farhat F., PREPRINT