Usefulness of the large language model ChatGPT (GPT-4) as a diagnostic tool and information source in dermatology

被引:1
|
作者
Nielsen, Jacob P. S. [1 ,4 ]
Gronhoj, Christian [1 ]
Skov, Lone [2 ,3 ]
Gyldenlove, Mette [2 ,3 ]
机构
[1] Copenhagen Univ Hosp, Dept Otorhinolaryngol Head & Neck Surg & Audiol, Copenhagen, Denmark
[2] Copenhagen Univ Hosp Herlev & Gentofte, Dept Dermatol & Allergy, Copenhagen, Denmark
[3] Univ Copenhagen, Fac Hlth & Med Sci, Dept Clin Med, Copenhagen, Denmark
[4] Copenhagen Univ Hosp, Dept Otorhinolaryngol Head & Neck Surg & Audiol, Rigshosp, Blegdamsvej 9, DK-2100 Copenhagen, Denmark
来源
JEADV CLINICAL PRACTICE | 2024年 / 3卷 / 05期
关键词
AI; artificial intelligence; Chatbot; ChatGPT; clinical dermatology; GPT-4; information source; Large Language Model; LLM; skin disease;
D O I
10.1002/jvc2.459
中图分类号
R75 [皮肤病学与性病学];
学科分类号
100206 ;
摘要
BackgroundThe field of artificial intelligence is rapidly evolving. As an easily accessible platform with vast user engagement, the Chat Generative Pre-Trained Transformer (ChatGPT) holds great promise in medicine, with the latest version, GPT-4, capable of analyzing clinical images.ObjectivesTo evaluate ChatGPT as a diagnostic tool and information source in clinical dermatology.MethodsA total of 15 clinical images were selected from the Danish web atlas, Danderm, depicting various common and rare skin conditions. The images were uploaded to ChatGPT version GPT-4, which was prompted with 'Please provide a description, a potential diagnosis, and treatment options for the following dermatological condition'. The generated responses were assessed by senior registrars in dermatology and consultant dermatologists in terms of accuracy, relevance, and depth (scale 1-5), and in addition, the image quality was rated (scale 0-10). Demographic and professional information about the respondents was registered.ResultsA total of 23 physicians participated in the study. The majority of the respondents were consultant dermatologists (83%), and 48% had more than 10 years of training. The overall image quality had a median rating of 10 out of 10 [interquartile range (IQR): 9-10]. The overall median rating of the ChatGPT generated responses was 2 (IQR: 1-4), while overall median ratings in terms of relevance, accuracy, and depth were 2 (IQR: 1-4), 3 (IQR: 2-4) and 2 (IQR: 1-3), respectively.ConclusionsDespite the advancements in ChatGPT, including newly added image processing capabilities, the chatbot demonstrated significant limitations in providing reliable and clinically useful responses to illustrative images of various dermatological conditions.
引用
收藏
页码:1570 / 1575
页数:6
相关论文
共 50 条
  • [41] Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks
    Sottana, Andrea
    Liang, Bin
    Zou, Kai
    Yuan, Zheng
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 8776 - 8788
  • [42] The Implementation of Multimodal Large Language Models for Hydrological Applications: A Comparative Study of GPT-4 Vision, Gemini, LLaVa, and Multimodal-GPT
    Kadiyala, Likith Anoop
    Mermer, Omer
    Samuel, Dinesh Jackson
    Sermet, Yusuf
    Demir, Ibrahim
    HYDROLOGY, 2024, 11 (09)
  • [43] Large language model (ChatGPT) as a support tool for breast tumor board
    Sorin, Vera
    Klang, Eyal
    Sklair-Levy, Miri
    Cohen, Israel
    Zippel, Douglas B.
    Lahat, Nora Balint
    Konen, Eli
    Barash, Yiftach
    NPJ BREAST CANCER, 2023, 9 (01)
  • [44] TRANSFORMING SYSTEMATIC LITERATURE REVIEWS: UNLEASHING THE POTENTIAL OF GPT-4: A CUTTING-EDGE LARGE LANGUAGE MODEL, TO ELEVATE RESEARCH SYNTHESIS
    Attri, S.
    Kaur, R.
    Singh, B.
    Rai, P.
    VALUE IN HEALTH, 2024, 27 (06) : S270 - S270
  • [45] Large language model (ChatGPT) as a support tool for breast tumor board
    Vera Sorin
    Eyal Klang
    Miri Sklair-Levy
    Israel Cohen
    Douglas B. Zippel
    Nora Balint Lahat
    Eli Konen
    Yiftach Barash
    npj Breast Cancer, 9
  • [46] Evaluating the GPT-3.5 and GPT-4 Large Language Models for Zero-Shot Classification of South African Violent Event Data
    Kotze, Eduan
    Senekal, Burgert A.
    2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, BIG DATA, COMPUTING AND DATA COMMUNICATION SYSTEMS, ICABCD 2024, 2024,
  • [47] Exploring the potential utility of AI large language models for medical ethics: an expert panel evaluation of GPT-4
    Balas, Michael
    Wadden, Jordan Joseph
    Hebert, Philip C.
    Mathison, Eric
    Warren, Marika D.
    Seavilleklein, Victoria
    Wyzynski, Daniel
    Callahan, Alison
    Crawford, Sean A.
    Arjmand, Parnian
    Ing, Edsel B.
    JOURNAL OF MEDICAL ETHICS, 2024, 50 (02) : 90 - 96
  • [48] Harnessing Large Language Models for Structured Reporting in Breast Ultrasound: A Comparative Study of Open AI (GPT-4.0) and Microsoft Bing (GPT-4)
    Liu, ChaoXu
    Wei, MinYan
    Qin, Yu
    Zhang, MeiXiang
    Jiang, Huan
    Xu, JiaLe
    Zhang, YuNing
    Hua, Qing
    Hou, YiQing
    Dong, YiJie
    Xia, ShuJun
    Li, Ning
    Zhou, JianQiao
    ULTRASOUND IN MEDICINE AND BIOLOGY, 2024, 50 (11): : 1697 - 1703
  • [49] Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Glucose-Lowering Medication Under Conditions of Clinical Uncertainty
    Flory, James H.
    Ancker, Jessica S.
    Kim, Scott Y. H.
    Kuperman, Gilad
    Petrov, Aleksandr
    Vickers, Andrew
    DIABETES CARE, 2025, 48 (02)
  • [50] Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources
    Srinivasan, Nitin
    Samaan, Jamil S.
    Rajeev, Nithya D.
    Kanu, Mmerobasi U.
    Yeo, Yee Hui
    Samakar, Kamran
    SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES, 2024, 38 (05): : 2522 - 2532