Efficacy of large language models and their potential in Obstetrics and Gynecology education

被引:2
作者
Eoh, Kyung Jin [1 ]
Kwon, Gu Yeun [2 ]
Lee, Eun Jin [2 ]
Lee, Joonho [2 ]
Lee, Inha [3 ]
Kim, Young Tae [2 ]
Nam, Eun Ji [2 ]
机构
[1] Yonsei Univ, Yongin Severance Hosp, Coll Med, Dept Obstet & Gynecol, Yongin, South Korea
[2] Yonsei Univ, Severance Hosp, Inst Womens Med Life Sci, Coll Med, Seoul, South Korea
[3] Yonsei Univ, Coll Med, Gangnam Severance Hosp, Seoul, South Korea
关键词
Artificial intelligence; Obstetrics; Gynecology; Medical education; PERFORMANCE; CHATGPT; GPT-4;
D O I
10.5468/ogs.24211
中图分类号
R71 [妇产科学];
学科分类号
100211 ;
摘要
The performance of large language models (LLMs) and their potential utility in obstetric and gynecological education are topics of ongoing debate. This study aimed to contribute to this discussion by examining the recent advancements in LLM technology and their transformative potential in artificial intelligence. This study assessed the performance of generative pre-trained transformer (GPT)-3.5 and -4 in understanding clinical information, as well as its potential implications for obstetric and gynecological education. Obstetrics and gynecology residents at three hospitals underwent an annual promotional examination, from which 116 of the 170 questions over 4 years (2020-2023) were analyzed, excluding 54 questions with images. The scores achieved by GPT-3.5, -4, and the 100 residents were compared. The average scores across all 4 years for GPT-3.5 and -4 were 38.79 (standard deviation [SD], 5.65) and 79.31 (SD, 3.67), respectively. For groups first-year resident, second-year resident, and third-year resident, the cumulative annual average scores were 79.12 (SD, 9.00), 80.95 (SD, 5.86), and 83.60 (SD, 6.82), respectively. No statistically significant differences were observed between the scores of GPT-4.0 and those of the residents. When analyzing questions specific to obstetrics, the average scores for GPT-3.5 and -4.0 were 33.44 (SD, 10.18) and 90.22 (SD, 7.68), respectively. GPT-4 demonstrated exceptional performance in obstetrics, different types of data interpretation, and problem solving, showcasing the potential utility of LLMs in these areas. However, acknowledging the constraints of LLMs is crucial and their utilization should augment human expertise and discernment.
引用
收藏
页码:550 / 556
页数:7
相关论文
共 20 条
[1]   Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions [J].
Abd-alrazaq, Alaa ;
AlSaad, Rawan ;
Alhuwail, Dari ;
Ahmed, Arfan ;
Healy, Padraig Mark ;
Latifi, Syed ;
Aziz, Sarah ;
Damseh, Rafat ;
Alrazak, Sadam Alabed ;
Sheikh, Javaid .
JMIR MEDICAL EDUCATION, 2023, 9
[2]  
Ahn KH, 2022, OBSTET GYNECOL SCI, V65, P113
[3]   Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations [J].
Ali, Rohaid ;
Tang, Oliver Y. ;
Connolly, Ian D. ;
Sullivan, Patricia L. Zadnik ;
Shin, John H. ;
Fridley, Jared S. ;
Asaad, Wael F. ;
Cielo, Deus ;
Oyelese, Adetokunbo A. ;
Doberstein, Curtis E. ;
Gokaslan, Ziya L. ;
Telfeian, Albert E. .
NEUROSURGERY, 2023, 93 (06) :1353-1365
[4]   Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: a performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy's rule-based and machine learning-based methods [J].
Bhattarai, Kriti ;
Oh, Inez Y. ;
Sierra, Jonathan Moran ;
Tang, Jonathan ;
Payne, Philip R. O. ;
Abrams, Zach ;
Lai, Albert M. .
JAMIA OPEN, 2024, 7 (03)
[5]   Current practices and perspectives on clerkship grading in obstetrics and gynecology [J].
Chen, Katherine T. ;
Baecher-Lind, Laura ;
Morosky, Christopher M. ;
Bhargava, Rashmi ;
Fleming, Angela ;
Royce, Celeste S. ;
Schaff, Jonathan A. ;
Sims, Shireen Madani ;
Sonn, Tammy ;
Stephenson-Famy, Alyssa ;
Sutton, Jill M. ;
Morgan, Helen Kang .
AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2024, 230 (01) :e1-e6
[6]   Medical education trends for future physicians in the era of advanced technology and artificial intelligence: an integrative review [J].
Han, Eui-Ryoung ;
Yeo, Sanghee ;
Kim, Min-Jeong ;
Lee, Young-Hee ;
Park, Kwi-Hwa ;
Roh, Hyerin .
BMC MEDICAL EDUCATION, 2019, 19 (01)
[7]  
Jamal Amr, 2023, Cureus, V15, pe43036, DOI [10.7759/cureus.43036, 10.7759/cureus.43036]
[8]   Being Affable, Available, and Able Is Not Enough Prioritizing Surgeon-Patient Communication [J].
Kapadia, Muneera R. ;
Kieran, Kathleen .
JAMA SURGERY, 2020, 155 (04) :277-278
[9]  
Kung TH, 2023, PLoS digital health, V2, DOI [10.1101/2022.12.19.22283643, DOI 10.1101/2022.12.19.22283643, DOI 10.1371/JOURNAL.PDIG.0000198]
[10]   BioGPT: generative pre-trained transformer for biomedical text generation and mining [J].
Luo, Renqian ;
Sun, Liai ;
Xia, Yingce ;
Qin, Tao ;
Zhang, Sheng ;
Poon, Hoifung ;
Liu, Tie-Yan .
BRIEFINGS IN BIOINFORMATICS, 2022, 23 (06)