Usefulness of the large language model ChatGPT (GPT-4) as a diagnostic tool and information source in dermatology

被引：1

作者：

Nielsen, Jacob P. S. ^{[1
,4
]}

Gronhoj, Christian ^{[1
]}

Skov, Lone ^{[2
,3
]}

Gyldenlove, Mette ^{[2
,3
]}

机构：

[1] Copenhagen Univ Hosp, Dept Otorhinolaryngol Head & Neck Surg & Audiol, Copenhagen, Denmark

[2] Copenhagen Univ Hosp Herlev & Gentofte, Dept Dermatol & Allergy, Copenhagen, Denmark

[3] Univ Copenhagen, Fac Hlth & Med Sci, Dept Clin Med, Copenhagen, Denmark

[4] Copenhagen Univ Hosp, Dept Otorhinolaryngol Head & Neck Surg & Audiol, Rigshosp, Blegdamsvej 9, DK-2100 Copenhagen, Denmark

来源：

JEADV CLINICAL PRACTICE | 2024年 / 3卷 / 05期

关键词：

AI; artificial intelligence; Chatbot; ChatGPT; clinical dermatology; GPT-4; information source; Large Language Model; LLM; skin disease;

D O I：

10.1002/jvc2.459

中图分类号：

R75 [皮肤病学与性病学];

学科分类号：

100206 ;

摘要：

BackgroundThe field of artificial intelligence is rapidly evolving. As an easily accessible platform with vast user engagement, the Chat Generative Pre-Trained Transformer (ChatGPT) holds great promise in medicine, with the latest version, GPT-4, capable of analyzing clinical images.ObjectivesTo evaluate ChatGPT as a diagnostic tool and information source in clinical dermatology.MethodsA total of 15 clinical images were selected from the Danish web atlas, Danderm, depicting various common and rare skin conditions. The images were uploaded to ChatGPT version GPT-4, which was prompted with 'Please provide a description, a potential diagnosis, and treatment options for the following dermatological condition'. The generated responses were assessed by senior registrars in dermatology and consultant dermatologists in terms of accuracy, relevance, and depth (scale 1-5), and in addition, the image quality was rated (scale 0-10). Demographic and professional information about the respondents was registered.ResultsA total of 23 physicians participated in the study. The majority of the respondents were consultant dermatologists (83%), and 48% had more than 10 years of training. The overall image quality had a median rating of 10 out of 10 [interquartile range (IQR): 9-10]. The overall median rating of the ChatGPT generated responses was 2 (IQR: 1-4), while overall median ratings in terms of relevance, accuracy, and depth were 2 (IQR: 1-4), 3 (IQR: 2-4) and 2 (IQR: 1-3), respectively.ConclusionsDespite the advancements in ChatGPT, including newly added image processing capabilities, the chatbot demonstrated significant limitations in providing reliable and clinically useful responses to illustrative images of various dermatological conditions.

引用

页码：1570 / 1575

页数：6

共 50 条

[41] Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks
Sottana, Andrea
Liang, Bin
Zou, Kai
Yuan, Zheng
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 8776 - 8788
[42] The Implementation of Multimodal Large Language Models for Hydrological Applications: A Comparative Study of GPT-4 Vision, Gemini, LLaVa, and Multimodal-GPT
Kadiyala, Likith Anoop
Mermer, Omer
Samuel, Dinesh Jackson
Sermet, Yusuf
Demir, Ibrahim
HYDROLOGY, 2024, 11 (09)
[43] Large language model (ChatGPT) as a support tool for breast tumor board
Sorin, Vera
Klang, Eyal
Sklair-Levy, Miri
Cohen, Israel
Zippel, Douglas B.
Lahat, Nora Balint
Konen, Eli
Barash, Yiftach
NPJ BREAST CANCER, 2023, 9 (01)
[44] TRANSFORMING SYSTEMATIC LITERATURE REVIEWS: UNLEASHING THE POTENTIAL OF GPT-4: A CUTTING-EDGE LARGE LANGUAGE MODEL, TO ELEVATE RESEARCH SYNTHESIS
Attri, S.
Kaur, R.
Singh, B.
Rai, P.
VALUE IN HEALTH, 2024, 27 (06) : S270 - S270
[45] Large language model (ChatGPT) as a support tool for breast tumor board
Vera Sorin
Eyal Klang
Miri Sklair-Levy
Israel Cohen
Douglas B. Zippel
Nora Balint Lahat
Eli Konen
Yiftach Barash
npj Breast Cancer, 9
[46] Evaluating the GPT-3.5 and GPT-4 Large Language Models for Zero-Shot Classification of South African Violent Event Data
Kotze, Eduan
Senekal, Burgert A.
2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, BIG DATA, COMPUTING AND DATA COMMUNICATION SYSTEMS, ICABCD 2024, 2024,
[47] Exploring the potential utility of AI large language models for medical ethics: an expert panel evaluation of GPT-4
Balas, Michael
Wadden, Jordan Joseph
Hebert, Philip C.
Mathison, Eric
Warren, Marika D.
Seavilleklein, Victoria
Wyzynski, Daniel
Callahan, Alison
Crawford, Sean A.
Arjmand, Parnian
Ing, Edsel B.
JOURNAL OF MEDICAL ETHICS, 2024, 50 (02) : 90 - 96
[48] Harnessing Large Language Models for Structured Reporting in Breast Ultrasound: A Comparative Study of Open AI (GPT-4.0) and Microsoft Bing (GPT-4)
Liu, ChaoXu
Wei, MinYan
Qin, Yu
Zhang, MeiXiang
Jiang, Huan
Xu, JiaLe
Zhang, YuNing
Hua, Qing
Hou, YiQing
Dong, YiJie
Xia, ShuJun
Li, Ning
Zhou, JianQiao
ULTRASOUND IN MEDICINE AND BIOLOGY, 2024, 50 (11): : 1697 - 1703
[49] Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Glucose-Lowering Medication Under Conditions of Clinical Uncertainty
Flory, James H.
Ancker, Jessica S.
Kim, Scott Y. H.
Kuperman, Gilad
Petrov, Aleksandr
Vickers, Andrew
DIABETES CARE, 2025, 48 (02)
[50] Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources
Srinivasan, Nitin
Samaan, Jamil S.
Rajeev, Nithya D.
Kanu, Mmerobasi U.
Yeo, Yee Hui
Samakar, Kamran
SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES, 2024, 38 (05): : 2522 - 2532

← 1 2 3 4 5 →