Optimizing Diagnostic Performance of ChatGPT: The Impact of Prompt Engineering on Thoracic Radiology Cases

被引:1
作者
Cesur, Turay [1 ]
Gunes, Yasin Celal [2 ]
机构
[1] Ankara Mamak State Hosp, Radiol, Ankara, Turkiye
[2] Kirikkale Yuksek Ihtisas Hosp, Radiol, Ankara, Turkiye
关键词
prompt engineering; radiology; large language models; gpt-4; chat generative pre-trained transformer (chatgpt);
D O I
10.7759/cureus.60009
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background Recent studies have highlighted the diagnostic performance of ChatGPT 3.5 and GPT-4 in a text -based format, demonstrating their radiological knowledge across different areas. Our objective is to investigate the impact of prompt engineering on the diagnostic performance of ChatGPT 3.5 and GPT-4 in diagnosing thoracic radiology cases, highlighting how the complexity of prompts influences model performance. Methodology We conducted a retrospective cross-sectional study using 124 publicly available Case of the Month examples from the Thoracic Society of Radiology website. We initially input the cases into the ChatGPT versions without prompting. Then, we employed five different prompts, ranging from basic task -oriented to complex role-specific formulations to measure the diagnostic accuracy of ChatGPT versions. The differential diagnosis lists generated by the models were compared against the radiological diagnoses listed on the Thoracic Society of Radiology website, with a scoring system in place to comprehensively assess the accuracy. Diagnostic accuracy and differential diagnosis scores were analyzed using the McNemar, Chisquare, Kruskal-Wallis, and Mann -Whitney U tests. Results Without any prompts, ChatGPT 3.5's accuracy was 25% (31/124), which increased to 56.5% (70/124) with the most complex prompt ( P < 0.001). GPT-4 showed a high baseline accuracy at 53.2% (66/124) without prompting. This accuracy increased to 59.7% (74/124) with complex prompts ( P = 0.09). Notably, there was no statistical difference in peak performance between ChatGPT 3.5 (70/124) and GPT-4 (74/124) ( P = 0.55). Conclusions This study emphasizes the critical influence of prompt engineering on enhancing the diagnostic performance of ChatGPT versions, especially ChatGPT 3.5.
引用
收藏
页数:12
相关论文
共 24 条
  • [1] Performance of ChatGPT on the Brazilian Radiology and Diagnostic Imaging and Mammography Board Examinations
    Almeida, Leonardo C.
    Farina, Eduardo M. J. M.
    Kurilei, Paulo E. A.
    Abdala, Nitamar
    Kitamura, Felipe C.
    [J]. RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2024, 6 (01)
  • [2] [Anonymous], 2024, A Comparative Study: Diagnostic Performance of ChatGPT 3.5, Google Bard, Microsoft Bing, and Radiologists in Thoracic Radiology Cases, DOI [10.1101/2024.01.18.24301495, DOI 10.1101/2024.01.18.24301495]
  • [3] Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations
    Bhayana, Rajesh
    Krishna, Satheesh
    Bleakney, Robert R.
    [J]. RADIOLOGY, 2023, 307 (05)
  • [4] Bossuyt PM, 2015, BMJ-BRIT MED J, V351, DOI [10.1136/bmj.h5527, 10.1373/clinchem.2015.246280, 10.1148/radiol.2015151516]
  • [5] Accuracy of Large Language Models in Answering ESUR Guidelines on Contrast Media-Related Questions
    Gunes, Yasin Celal
    Cesur, Turay
    [J]. ACADEMIC RADIOLOGY, 2024, 31 (07) : 3070 - 3072
  • [6] Assessing the diagnostic performance of large language models with European Diploma in Musculoskeletal Radiology (EDiMSK) examination sample questions
    Gunes, Yasin Celal
    Cesur, Turay
    [J]. JAPANESE JOURNAL OF RADIOLOGY, 2024, 42 (06) : 673 - 674
  • [7] Diagnostic Accuracy of Large Language Models in the European Board of Interventional Radiology Examination (EBIR) Sample Questions
    Gunes, Yasin Celal
    Cesur, Turay
    [J]. CARDIOVASCULAR AND INTERVENTIONAL RADIOLOGY, 2024, 47 (06) : 836 - 837
  • [8] ChatGPT: A brief narrative review
    Gupta, Bulbul
    Mufti, Tabish
    Sohail, Shahab Saquib
    Madsen, Dag Oivind
    [J]. COGENT BUSINESS & MANAGEMENT, 2023, 10 (03):
  • [9] Horiuchi D, 2023, bioRxiv, V20, P2024, DOI [10.1101/2023.08.28.23294607, DOI 10.1101/2023.08.28.23294607]
  • [10] The Role of Prompt Engineering in Radiology Applications of Generative AI
    Kaba, Esat
    Solak, Merve
    Celiker, Fatma Beyazal
    [J]. ACADEMIC RADIOLOGY, 2024, 31 (06) : 2641 - 2641