Evaluation of GPT-4's Chest X-Ray Impression Generation: A Reader Study on Performance and Perception

被引:9
作者
Ziegelmayer, Sebastian [1 ,2 ]
Marka, Alexander W. [1 ]
Lenhart, Nicolas [1 ]
Nehls, Nadja [1 ]
Reischl, Stefan [1 ]
Harder, Felix [1 ]
Sauter, Andreas [1 ]
Makowski, Marcus [1 ]
Graf, Markus [1 ]
Gawlitza, Joshua [1 ]
机构
[1] Tech Univ Munich, Sch Med, Dept Diagnost & Intervent Radiol, Ismaninger Str 22, D-81675 Munich, Germany
[2] Tech Univ Munich, Klinikum Rechts Isar, Ismaninger Str 22, D-81675 Munich, Germany
关键词
generative model; GPT; medical imaging; artificial intelligence; imaging; radiology; radiological; radiography; diagnostic; chest; x-ray; x-rays; generative; multimodal; impression; impressions; image; images; AI;
D O I
10.2196/50865
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Exploring the generative capabilities of the multimodal GPT-4, our study uncovered significant differences between radiological assessments and automatic evaluation metrics for chest x-ray impression generation and revealed radiological bias.
引用
收藏
页数:5
相关论文
共 17 条
  • [1] Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study
    Adams, Lisa C.
    Truhn, Daniel
    Busch, Felix
    Kader, Avan
    Niehues, Stefan M.
    Makowski, Marcus R.
    Bressem, Keno K.
    [J]. RADIOLOGY, 2023, 307 (04)
  • [2] GPT-4 in Radiology: Improvements in Advanced Reasoning
    Bhayana, Rajesh
    Bleakney, Robert R.
    Krishna, Satheesh
    [J]. RADIOLOGY, 2023, 307 (05)
  • [3] Endo M, 2021, MACH LEARN HLTH
  • [4] How to Create a Great Radiology Report
    Hartung, Michael P.
    Bickle, Ian C.
    Gaillard, Frank
    Kanne, Jeffrey P.
    [J]. RADIOGRAPHICS, 2020, 40 (06) : 1658 - 1670
  • [5] Jain Shantanu, PREPRINT
  • [6] Human heuristics for AI-generated language are flawed
    Jakesch, Maurice
    Hancock, Jeffrey T.
    Naaman, Mor
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (11)
  • [7] Kojima Takeshi, 2023, arXiv
  • [8] Lewis P, arXiv
  • [9] BLEU: a method for automatic evaluation of machine translation
    Papineni, K
    Roukos, S
    Ward, T
    Zhu, WJ
    [J]. 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2002, : 311 - 318
  • [10] Radford A, 2021, PR MACH LEARN RES, V139