ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology

被引:76
作者
Li, Sarah W. [1 ]
Kemp, Matthew W. [2 ,3 ,4 ,5 ]
Logan, Susan J. S. [1 ]
Dimri, Pooja Sharma [1 ]
Singh, Navkaran [1 ]
Mattar, Citra N. Z. [1 ,2 ]
Dashraath, Pradip [1 ,2 ]
Ramlal, Harshaana [1 ]
Mahyuddin, Aniza P. [2 ]
Kanayan, Suren [2 ]
Carter, Sean W. D. [2 ]
Thain, Serene P. T. [6 ]
Fee, Erin L. [4 ]
Illanes, Sebastian E. [2 ,7 ,8 ,9 ,10 ]
Choolani, Mahesh A. [2 ]
机构
[1] Natl Univ Singapore Hosp, Dept Obstet & Gynaecol, Singapore, Singapore
[2] Natl Univ Singapore, Yong Loo Lin Sch Med, Dept Obstet & Gynaecol, Singapore, Singapore
[3] Tohoku Univ Hosp, Ctr Perinatal & Neonatal Med, Sendai, Japan
[4] Univ Western Australia, Dept Obstet & Gynaecol, Perth, Australia
[5] King Edward Mem Hosp, Women & Infants Res Fdn, Subiaco, WA, Australia
[6] KK Womens & Childrens Hosp, Dept Maternal Fetal Med, Div Obstet & Gynaecol, Singapore, Singapore
[7] Univ Andes, Dept Obstet & Gynecol, Reprod Biol Lab, Santiago, Chile
[8] Univ Los Andes, Ctr Biomed Res & Innovat, Reprod Biol Program, Santiago, Chile
[9] Ctr Intervent Med Precis & Adv Cellular Therapy I, Santiago, Chile
[10] Univ Los Andes, Fac Med, Dept Obstet & Gynecol, Santiago, Chile
关键词
artificial intelligence; Chat Generative Pre-trained Transformer; objective structured clinical examination; obstetrics and gynecology; postgraduate specialty training; reasoning;
D O I
10.1016/j.ajog.2023.04.020
中图分类号
R71 [妇产科学];
学科分类号
100211 ;
摘要
BACKGROUND: Natural language processing is a form of artificial intelligence that allows human users to interface with a machine without using complex codes. The ability of natural language processing systems, such as ChatGPT, to successfully engage with healthcare systems requiring fluid reasoning, specialist data interpretation, and empathetic communication in an unfamiliar and evolving environment is poorly studied. This study investigated whether the ChatGPT interface could engage with and complete a mock objective structured clinical examination simulating assessment for membership of the Royal College of Obstetricians and Gynaecologists.OBJECTIVE: This study aimed to determine whether ChatGPT, without additional training, would achieve a score at least equivalent to that achieved by human candidates who sat for virtual objective structured clinical examinations in Singapore.STUDY DESIGN: This study was conducted in 2 phases. In the first phase, a total of 7 structured discussion questions were selected from 2 historical cohorts (cohorts A and B) of objective structured clinical ex-amination questions. ChatGPT was examined using these questions and responses recorded in a script. Of note, 2 human candidates (acting as anonymizers) were examined on the same questions using videoconfer-encing, and their responses were transcribed verbatim into written scripts. The 3 sets of response scripts were mixed, and each set was allocated to 1 of 3 human actors. In the second phase, actors were used to presenting these scripts to examiners in response to the same examination questions. These responses were blind scored by 14 qualified examiners. ChatGPT scores were unblinded and compared with historical human candidate performance scores.RESULTS: The average score given to ChatGPT by 14 examiners was 77.2%. The average historical human score (n1/426 candidates) was 73.7 %. ChatGPT demonstrated sizable performance improvements over the average human candidate in several subject domains. The median time taken for ChatGPT to complete each station was 2.54 minutes, well before the 10 minutes allowed.CONCLUSION: ChatGPT generated factually accurate and contextually relevant structured discussion answers to complex and evolving clinical questions based on unfamiliar settings within a very short period. ChatGPT outperformed human candidates in several knowledge areas. Not all ex -aminers were able to discern between human and ChatGPT responses. Our data highlight the emergent ability of natural language processing models to demonstrate fluid reasoning in unfamiliar environments and successfully compete with human candidates that have undergone extensive specialist training.
引用
收藏
页码:172.e1 / 172.e12
页数:12
相关论文
共 42 条
[1]   The AI writing on the wall [J].
不详 .
NATURE MACHINE INTELLIGENCE, 2023, 5 (1) :1-1
[2]  
[Anonymous], 2023, GPT-4
[3]   The future of medical education and research: Is ChatGPT a blessing or blight in disguise? [J].
Arif, Taha Bin ;
Munaf, Uzair ;
Ul-Haque, Ibtehaj .
MEDICAL EDUCATION ONLINE, 2023, 28 (01)
[4]   ChatGPT in Surgical Practice-a New Kid on the Block [J].
Bhattacharya, Kaushik ;
Bhattacharya, Aditya Shikar ;
Bhattacharya, Neela ;
Yagnik, Vipul D. ;
Garg, Pankaj ;
Kumar, Sandeep .
INDIAN JOURNAL OF SURGERY, 2023, 85 (06) :1346-1349
[5]  
Bommarito I I., 2022, arXiv
[6]  
Brown TJ., 2020, [No title captured]
[7]   This new conversational AI model can be your friend, philosopher, and guide ... and even your worst enemy ... [J].
Chatterjee, Joyjit ;
Dethlefs, Nina .
PATTERNS, 2023, 4 (01)
[8]   Chat Generative Pre-trained Transformer: why we should embrace this technology [J].
Chavez, Martin R. ;
Butler, Thomas S. ;
Rekawek, Patricia ;
Heo, Hye ;
Kinzler, Wendy L. .
AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2023, 228 (06) :706-711
[9]  
de Winter J., 2023, ResearchGate
[10]  
Graham Flora, 2022, Nature, DOI [10.1038/d41586-022-04437-2, 10.1038/d41586-022-04437-2]