Comparing Vision-Capable Models, GPT-4 and Gemini, With GPT-3.5 on Taiwan's Pulmonologist Exam

被引：1

作者：

Chen, Chih-Hsiung ^{[1
]}

Hsieh, Kuang-Yu ^{[1
]}

Huang, Kuo-En ^{[1
]}

Lai, Hsien-Yun ^{[2
]}

机构：

[1] Mennonite Christian Hosp, Dept Crit Care Med, Hualien, Taiwan

[2] Mennonite Christian Hosp, Dept Educ & Res, Hualien, Taiwan

来源：

CUREUS JOURNAL OF MEDICAL SCIENCE | 2024年 / 16卷 / 08期

关键词：

vision feature; pulmonologist exam; gemini; gpt; large language models; artificial intelligence;

D O I：

10.7759/cureus.67641

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

Introduction The latest generation of large language models (LLMs) features multimodal capabilities, allowing them to interpret graphics, images, and videos, which are crucial in medical fields. This study investigates the vision capabilities of the next-generation Generative Pre-trained Transformer 4 (GPT-4) and Google's Gemini. Methods To establish a comparative baseline, we used GPT-3.5, a model limited to text processing, and evaluated the performance of both GPT-4 and Gemini on questions from the Taiwan Specialist Board Exams in Pulmonary and Critical Care Medicine. Our dataset included 1,100 questions from 2012 to 2023, with 100 questions per year. Of these, 1,059 were in pure text and 41 were text with images, with the majority in a non-English language and only six in pure English. Results For each annual exam consisting of 100 questions from 2013 to 2023, GPT-4 achieved scores of 66, 69, 51, 64, 72, 64, 66, 64, 63, 68, and 67, respectively. Gemini scored 45, 48, 45, 45, 46, 59, 54, 41, 53, 45, and 45, while GPT-3.5 scored 39, 33, 35, 36, 32, 33, 43, 28, 32, 33, and 36. Conclusions These results demonstrate that the newer LLMs with vision capabilities significantly outperform the text- only model. When a passing score of 60 was set, GPT-4 passed most exams and approached human performance.

引用

页数：9

共 35 条

[1] Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard
Farhat, Faiza
Chaudhry, Beenish Moalla
Nadeem, Mohammad
Sohail, Shahab Saquib
Madsen, Dag Oivind
JMIR MEDICAL EDUCATION, 2024, 10
[2] GPT-3.5 Turbo and GPT-4 Turbo in Title and Abstract Screening for Systematic Reviews
Oami, Takehiko
Okada, Yohei
Nakada, Taka-aki
JMIR MEDICAL INFORMATICS, 2025, 13
[3] GPT-4 in Nuclear Medicine Education: Does It Outperform GPT-3.5?
Currie, Geoffrey M.
JOURNAL OF NUCLEAR MEDICINE TECHNOLOGY, 2023, 51 (04) : 314 - 317
[4] Evaluating the performance of GPT-3.5, GPT-4, and GPT-4o in the Chinese National Medical Licensing Examination
Dingyuan Luo
Mengke Liu
Runyuan Yu
Yulian Liu
Wenjun Jiang
Qi Fan
Naifeng Kuang
Qiang Gao
Tao Yin
Zuncheng Zheng
Scientific Reports, 15 (1)
[5] Limitations of GPT-3.5 and GPT-4 in Applying Fleischner Society Guidelines to Incidental Lung Nodules
Gamble, Joel
Ferguson, Duncan
Yuen, Joanna
Sheikh, Adnan
CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2024, 75 (02): : 412 - 416
[6] Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study
Takagi, Soshi
Watari, Takashi
Erabi, Ayano
Sakaguchi, Kota
JMIR MEDICAL EDUCATION, 2023, 9
[7] How do large language models answer breast cancer quiz questions? A comparative study of GPT-3.5, GPT-4 and Google Gemini
Irmici, Giovanni
Cozzi, Andrea
Della Pepa, Gianmarco
De Berardinis, Claudia
D'Ascoli, Elisa
Cellina, Michaela
Ce, Maurizio
Depretto, Catherine
Scaperrotta, Gianfranco
RADIOLOGIA MEDICA, 2024, 129 (10): : 1463 - 1467
[8] From GPT-3.5 to GPT-4.o: A Leap in AI's Medical Exam Performance
Kipp, Markus
INFORMATION, 2024, 15 (09)
[9] Assessing the performance of large language models (GPT-3.5 and GPT-4) and accurate clinical information for pediatric nephrology
Sav, Nadide Melike
PEDIATRIC NEPHROLOGY, 2025,
[10] Performance of GPT-4 and GPT-3.5 in generating accurate and comprehensive diagnoses across medical subspecialties
Luk, Dik Wai Anderson
Ip, Whitney Chin Tung
Shea, Yat-fung
JOURNAL OF THE CHINESE MEDICAL ASSOCIATION, 2024, 87 (03) : 259 - 260

← 1 2 3 4 →