Assessment of the capacity of ChatGPT as a self-learning tool in medical pharmacology: a study using MCQs

被引:26
作者
Choi, Woong [1 ]
机构
[1] Chungbuk Natl Univ, Coll Med, Dept Pharmacol, Cheongju 28644, Chungbuk, South Korea
关键词
ChatGPT; Large language model; Self-directed learning; Performance; Multiple-choice questions; Rationale; Referencing; PERFORMANCE;
D O I
10.1186/s12909-023-04832-x
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
BackgroundChatGPT is a large language model developed by OpenAI that exhibits a remarkable ability to simulate human speech. This investigation attempts to evaluate the potential of ChatGPT as a standalone self-learning tool, with specific attention on its efficacy in answering multiple-choice questions (MCQs) and providing credible rationale for its responses.MethodsThe study used 78 test items from the Korean Comprehensive Basic Medical Sciences Examination (K-CBMSE) for years 2019 to 2021. 78 test items translated from Korean to English with four lead-in prompts per item resulted in a total of 312 MCQs. The MCQs were submitted to ChatGPT and the responses were analyzed for correctness, consistency, and relevance.ResultsChatGPT responded with an overall accuracy of 76.0%. Compared to its performance on recall and interpretation questions, the model performed poorly on problem-solving questions. ChatGPT offered correct rationales for 77.8% (182/234) of the responses, with errors primarily arising from faulty information and flawed reasoning. In terms of references, ChatGPT provided incorrect citations for 69.7% (191/274) of the responses. While the veracity of reference paragraphs could not be ascertained, 77.0% (47/61) were deemed pertinent and accurate with respect to the answer key.ConclusionThe current version of ChatGPT has limitations in accurately answering MCQs and generating correct and relevant rationales, particularly when it comes to referencing. To avoid possible threats such as spreading inaccuracies and decreasing critical thinking skills, ChatGPT should be used with supervision.
引用
收藏
页数:8
相关论文
共 38 条
[1]   Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions [J].
Abd-alrazaq, Alaa ;
AlSaad, Rawan ;
Alhuwail, Dari ;
Ahmed, Arfan ;
Healy, Padraig Mark ;
Latifi, Syed ;
Aziz, Sarah ;
Damseh, Rafat ;
Alrazak, Sadam Alabed ;
Sheikh, Javaid .
JMIR MEDICAL EDUCATION, 2023, 9
[2]  
Ahn Sangzin, 2023, Korean J Med Educ, V35, P103, DOI [10.3946/kjme.2023.253, 10.3946/kjme.2023.253]
[3]   Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations [J].
Ali, Rohaid ;
Tang, Oliver Y. ;
Connolly, Ian D. ;
Sullivan, Patricia L. Zadnik ;
Shin, John H. ;
Fridley, Jared S. ;
Asaad, Wael F. ;
Cielo, Deus ;
Oyelese, Adetokunbo A. ;
Doberstein, Curtis E. ;
Gokaslan, Ziya L. ;
Telfeian, Albert E. .
NEUROSURGERY, 2023, 93 (06) :1353-1365
[4]   Artificial Hallucinations in ChatGPT: Implications in Scientific Writing [J].
Alkaissi, Hussam ;
McFarlane, Samy I. .
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (02)
[5]  
Amin Z, 2003, Basics in medical education, P283
[6]  
[Anonymous], What is the size of the training set for GPT-3?
[7]   Large Language Models and Their Implications on Medical Education [J].
Bair, Henry ;
Norden, Justin .
ACADEMIC MEDICINE, 2023, 98 (08) :869-870
[8]   Performance of ChatGPT on a primary FRCA multiple choice question bank [J].
Birkett, Liam ;
Fowler, Thomas ;
Pullen, Simon .
BRITISH JOURNAL OF ANAESTHESIA, 2023, 131 (02) :e34-e35
[9]   ChatGPT and Generative Artificial Intelligence for Medical Education: Potential Impact and Opportunity [J].
Boscardin, Christy K. ;
Gin, Brian ;
Golde, Polo Black ;
Hauer, Karen E. .
ACADEMIC MEDICINE, 2024, 99 (01) :22-27
[10]  
BUCKWALTER JA, 1981, J MED EDUC, V56, P115