Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study

被引：1

作者：

Wu, Zelin ^{[1
,2
]}

Gan, Wenyi ^{[3
]}

Xue, Zhaowen ^{[1
,2
]}

Ni, Zhengxin ^{[4
]}

Zheng, Xiaofei ^{[1
,2
]}

Zhang, Yiyi ^{[1
,2
]}

机构：

[1] First Affiliated Hosp, Dept Bone & Joint Surg, 613 Huangpu Ave West, Guangzhou 510630, Peoples R China

[2] First Affiliated Hosp, Sports Med Ctr, 613 Huangpu Ave West, Guangzhou 510630, Peoples R China

[3] Zhuhai Peoples Hosp, Dept Joint Surg & Sports Med, Zhuhai, Peoples R China

[4] Yangzhou Univ, Sch Nursing, Yangzhou, Peoples R China

来源：

JMIR MEDICAL EDUCATION | 2024年 / 10卷

关键词：

artificial intelligence; ChatGPT; nursing licensure examination; nursing; LLMs; large language models; nursing education; AI; nursing student; large language model; licensing; observation; observational study; China; USA; United States of America; auxiliary tool; accuracy rate; theoretical; EDUCATION;

D O I：

10.2196/52746

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

Background: The creation of large language models (LLMs) such as ChatGPT is an important step in the development of artificial intelligence, which shows great potential in medical education due to its powerful language understanding and generative capabilities. The purpose of this study was to quantitatively evaluate and comprehensively analyze ChatGPT's performance in handling questions for the National Nursing Licensure Examination (NNLE) in China and the United States, including the National Council Licensure Examination for Registered Nurses (NCLEX-RN) and the NNLE. Objective: This study aims to examine how well LLMs respond to the NCLEX-RN and the NNLE multiple-choice questions (MCQs) in various language inputs. To evaluate whether LLMs can be used as multilingual learning assistance for nursing, and to assess whether they possess a repository of professional knowledge applicable to clinical nursing practice. Methods: First, we compiled 150 NCLEX-RN Practical MCQs, 240 NNLE Theoretical MCQs, and 240 NNLE Practical MCQs. Then, the translation function of ChatGPT 3.5 was used to translate NCLEX-RN questions from English to Chinese and NNLE questions from Chinese to English. Finally, the original version and the translated version of the MCQs were inputted into ChatGPT 4.0, ChatGPT 3.5, and Google Bard. Different LLMs were compared according to the accuracy rate, and the differences between different language inputs were compared. Results: The accuracy rates of ChatGPT 4.0 for NCLEX-RN practical questions and Chinese-translated NCLEX-RN practical questions were 88.7% (133/150) and 79.3% (119/150), respectively. Despite the statistical significance of the difference (P=.03), the correct rate was generally satisfactory. Around 71.9% (169/235) of NNLE Theoretical MCQs and 69.1% (161/233) of NNLE Practical MCQs were correctly answered by ChatGPT 4.0. The accuracy of ChatGPT 4.0 in processing NNLE Theoretical MCQs and NNLE Practical MCQs translated into English was 71.5% (168/235; P=.92) and 67.8% (158/233; P=.77), respectively, and there was no statistically significant difference between the results of text input in different languages. ChatGPT 3.5 (NCLEX-RN P=.003, NNLE Theoretical P<.001, NNLE Practical P=.12) and Google Bard (NCLEX-RN P<.001, NNLE Theoretical P<.001, NNLE Practical P<.001) had lower accuracy rates for nursing-related MCQs than ChatGPT 4.0 in English input. English accuracy was higher when compared with ChatGPT 3.5's Chinese input, and the difference was statistically significant (NCLEX-RN P=.02, NNLE Practical P=.02). Whether submitted in Chinese or English, the MCQs from the NCLEX-RN and NNLE demonstrated that ChatGPT 4.0 had the highest number of unique correct responses and the lowest number of unique incorrect responses among the 3 LLMs. Conclusions: This study, focusing on 618 nursing MCQs including NCLEX-RN and NNLE exams, found that ChatGPT 4.0 outperformed ChatGPT 3.5 and Google Bard in accuracy. It excelled in processing English and Chinese inputs, underscoring its potential as a valuable tool in nursing education and clinical decision-making.

引用

页数：12

共 50 条

[1] Medical Student Experiences and Perceptions of ChatGPT and Artificial Intelligence: Cross-Sectional Study
Alkhaaldi, Saif M., I
Kassab, Carl H.
Dimassi, Zakia
Alsoud, Leen Oyoun
Al Fahim, Maha
Al Hageh, Cynthia
Ibrahim, Halah
JMIR MEDICAL EDUCATION, 2023, 9
[2] Performance of ChatGPT in Ophthalmic Registration and ClinicalDiagnosis:Cross-Sectional Study
Ming, Shuai
Guo, Xiaohong
Guo, Qingge
Xie, Kunpeng
Chen, Dandan
Lei, Bo
JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
[3] Performance of ChatGPT on the India Undergraduate Community Medicine Examination: Cross-Sectional Study
Gandhi, Aravind P.
Joesph, Felista Karen
Rajagopal, Vineeth
Aparnavi, P.
Katkuri, Sushma
Dayama, Sonal
Satapathy, Prakasini
Khatib, Mahalaqua Nazli
Gaidhane, Shilpa
Zahiruddin, Quazi Syed
Behera, Ashish
JMIR FORMATIVE RESEARCH, 2024, 8
[4] Original Paper Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study
Flores-Cohaila, Javier A.
Garcia-Vicente, Abigail
Vizcarra-Jimenez, Sonia F.
De la Cruz-Galan, Janith
Gutierrez-Arratia, Jesus
Torres, Blanca Geraldine Quiroga
Taype-Rondan, Alvaro
JMIR MEDICAL EDUCATION, 2023, 9
[5] Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study
Kaewboonlert, Naritsaret
Poontananggul, Jiraphon
Pongsuwan, Natthipong
Bhakdisongkhram, Gun
JMIR MEDICAL EDUCATION, 2025, 11
[6] "Doctor ChatGPT, Can You Help Me?"The Patient's Perspective:Cross-Sectional Study
Armbruster, Jonas
Bussmann, Florian
Rothhaas, Catharina
Titze, Nadine
Gruetzner, Paul Alfred
Freischmidt, Holger
JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
[7] Comparing ChatGPT and clinical nurses' performances on tracheostomy care: A cross-sectional study
Wang, Tongyao
Mu, Juan
Chen, Jialing
Lin, Chia-Chin
INTERNATIONAL JOURNAL OF NURSING STUDIES ADVANCES, 2024, 6
[8] A comparison of RN licensure test plans: the United States and China with implications for the Chinese nursing licensure exam
Yao, Ting
Frutchey, Cheryl
Alslman, Eman
Burton, Denise
INTERNATIONAL JOURNAL OF NURSING EDUCATION SCHOLARSHIP, 2020, 17 (01):
[9] Uncovering Language Disparity of ChatGPT on Retinal Vascular Disease Classification: Cross-Sectional Study
Liu, Xiaocong
Wu, Jiageng
Shao, An
Shen, Wenyue
Ye, Panpan
Wang, Yao
Ye, Juan
Jin, Kai
Yang, Jie
JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
[10] Adopting continuous auditing A cross-sectional comparison between China and the United States
Sun, Ting
Alles, Michael
Vasarhelyi, Miklos A.
MANAGERIAL AUDITING JOURNAL, 2015, 30 (02) : 176 - 204

← 1 2 3 4 5 →