Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments

被引:0
作者
Dana Brin
Vera Sorin
Akhil Vaid
Ali Soroush
Benjamin S. Glicksberg
Alexander W. Charney
Girish Nadkarni
Eyal Klang
机构
[1] Chaim Sheba Medical Center,Department of Diagnostic Imaging
[2] Tel-Aviv University,Faculty of Medicine
[3] Icahn School of Medicine at Mount Sinai,The Charles Bronfman Institute of Personalized Medicine
[4] Icahn School of Medicine at Mount Sinai,Division of Data
[5] Icahn School of Medicine at Mount Sinai,Driven and Digital Medicine (D3M)
[6] Icahn School of Medicine at Mount Sinai,Hasso Plattner Institute for Digital Health
来源
Scientific Reports | / 13卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
The United States Medical Licensing Examination (USMLE) has been a subject of performance study for artificial intelligence (AI) models. However, their performance on questions involving USMLE soft skills remains unexplored. This study aimed to evaluate ChatGPT and GPT-4 on USMLE questions involving communication skills, ethics, empathy, and professionalism. We used 80 USMLE-style questions involving soft skills, taken from the USMLE website and the AMBOSS question bank. A follow-up query was used to assess the models’ consistency. The performance of the AI models was compared to that of previous AMBOSS users. GPT-4 outperformed ChatGPT, correctly answering 90% compared to ChatGPT’s 62.5%. GPT-4 showed more confidence, not revising any responses, while ChatGPT modified its original answers 82.5% of the time. The performance of GPT-4 was higher than that of AMBOSS's past users. Both AI models, notably GPT-4, showed capacity for empathy, indicating AI's potential to meet the complex interpersonal, ethical, and professional demands intrinsic to the practice of medicine.
引用
收藏
相关论文
共 29 条
  • [1] Jiang LY(2023)Health system-scale language models are all-purpose prediction engines Nature 619 357-362
  • [2] Liebrenz M(2023)Generating scholarly content with ChatGPT: Ethical challenges for medical publishing Lancet Digit. Health 5 e105-e106
  • [3] Schleifer R(2023)Use of large language models to predict neuroimaging J. Am. Coll. Radiol. 183 596-223
  • [4] Buadze A(2023)Large language models for oncological applications J. Cancer Res. Clin. Oncol. 9 44-1253
  • [5] Bhugra D(2023)How chatbots and large language model artificial intelligence systems will reshape modern medicine: Fountain of creativity or Pandora’s box? JAMA Intern. Med. 98 444-1249
  • [6] Smith A(2023)Large language model (ChatGPT) as a support tool for breast tumor board NPJ Breast Cancer 35 218-419
  • [7] Nazario-Johnson L(2023)Evolution of educational commission for foreign medical graduates certification in the absence of the USMLE step 2 clinical skills examination Acad. Med. 96 1250-undefined
  • [8] Zaki HA(2023)After the discontinuation of step 2 CS: A collaborative statement from the directors of clinical skills education (DOCS) Teach. Learn. Med. 96 1247-undefined
  • [9] Tung GA(2021)Farewell to the step 2 clinical skills exam: New opportunities, obligations, and next steps Acad. Med. 11 412-undefined
  • [10] Sorin V(2021)Discontinuation of the USMLE step 2 clinical skills examination: Studying the past to define the future Acad. Med. 9 589-undefined