Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation

被引:16
作者
Gobira, Mauro [1 ]
Nakayama, Luis Filipe [2 ,3 ]
Moreira, Rodrigo [1 ]
Andrade, Eric [2 ]
Regatieri, Caio Vinicius Saito [2 ]
Belfort Jr, Rubens [2 ]
机构
[1] Vis Inst, Inst Paulista Estudos & Pesquisas Oftalmol, Sao Paulo, SP, Brazil
[2] Univ Fed Sao Paulo, Dept Ophthalmol, Sao Paulo, SP, Brazil
[3] MIT, Inst Med Engn & Sci, Cambridge, MA 02142 USA
来源
REVISTA DA ASSOCIACAO MEDICA BRASILEIRA | 2023年 / 69卷 / 10期
关键词
Artificial intelligence; Education; Natural language processing;
D O I
10.1590/1806-9282.20230848
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
OBJECTIVE: The aim of this study was to evaluate the performance of ChatGPT-4.0 in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and as a tool to provide feedback on the quality of the examination. METHODS: A total of two independent physicians entered all examination questions into ChatGPT-4.0. After comparing the outputs with the test solutions, they classified the large language model answers as adequate, inadequate, or indeterminate. In cases of disagreement, they adjudicated and achieved a consensus decision on the ChatGPT accuracy. The performance across medical themes and nullified questions was compared using chi-square statistical analysis. RESULTS: In the Revalida examination, ChatGPT-4.0 answered 71 (87.7%) questions correctly and 10 (12.3%) incorrectly. There was no statistically significant difference in the proportions of correct answers among different medical themes (p=0.4886). The artificial intelligence model had a lower accuracy of 71.4% in nullified questions, with no statistical difference (p=0.241) between non-nullified and nullified groups. CONCLUSION: ChatGPT-4.0 showed satisfactory performance for the 2022 Brazilian National Examination for Medical Degree Revalidation. The large language model exhibited worse performance on subjective questions and public healthcare themes. The results of this study suggested that the overall quality of the Revalida examination questions is satisfactory and corroborates the nullified questions.
引用
收藏
页数:5
相关论文
共 15 条
  • [1] Bartz D, 2022, As ChatGPT's popularity explodes, US lawmakers take an interest
  • [2] Frieder S, 2023, Arxiv, DOI [arXiv:2301.13867, DOI 10.48550/ARXIV.2301.13867]
  • [3] Gobira MC, 2023, Pan-Am J Ophthalmol, V5, P17, DOI DOI 10.4103/PAJO.PAJO2123
  • [4] Opportunities and risks of ChatGPT in medicine, science, and academic publishing: a modern Promethean dilemma
    Homolak, Jan
    [J]. CROATIAN MEDICAL JOURNAL, 2023, 64 (01) : 1 - 3
  • [5] Natural language processing: state of the art, current trends and challenges
    Khurana, Diksha
    Koli, Aditya
    Khatter, Kiran
    Singh, Sukhdev
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (03) : 3713 - 3744
  • [6] Kung Tiffany H, 2023, PLOS Digit Health, V2, pe0000198, DOI 10.1371/journal.pdig.0000198
  • [7] ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology
    Li, Sarah W.
    Kemp, Matthew W.
    Logan, Susan J. S.
    Dimri, Pooja Sharma
    Singh, Navkaran
    Mattar, Citra N. Z.
    Dashraath, Pradip
    Ramlal, Harshaana
    Mahyuddin, Aniza P.
    Kanayan, Suren
    Carter, Sean W. D.
    Thain, Serene P. T.
    Fee, Erin L.
    Illanes, Sebastian E.
    Choolani, Mahesh A.
    [J]. AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2023, 229 (02) : 172.e1 - 172.e12
  • [8] Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment
    Mihalache, Andrew
    Popovic, Marko M.
    Muni, Rajeev H.
    [J]. JAMA OPHTHALMOLOGY, 2023, 141 (06) : 589 - 597
  • [9] Neveol A, 2015, Yearb Med Inform, V10, P194, DOI 10.15265/IY-2015-035
  • [10] openai, GPT-4