Measuring the Impact of AI in the Diagnosis of Hospitalized Patients A Randomized Clinical Vignette Survey Study

被引:42
作者
Jabbour, Sarah [1 ]
Fouhey, David [1 ,2 ,3 ]
Shepard, Stephanie [1 ]
Valley, Thomas S. [4 ]
Kazerooni, Ella A. [5 ]
Banovic, Nikola [1 ]
Wiens, Jenna [1 ,7 ]
Sjoding, Michael W. [4 ,6 ]
机构
[1] Univ Michigan, Comp Sci & Engn, Ann Arbor, MI USA
[2] NYU, Comp Sci Courant Inst, New York, NY USA
[3] NYU, Elect & Comp Engn Tandon Sch Engn, New York, NY USA
[4] Univ Michigan, Dept Internal Med, Pulm & Crit Care Med, Med Sch, Ann Arbor, MI USA
[5] Univ Michigan, Med Sch, Dept Radiol, Ann Arbor, MI USA
[6] Internal Med, G020W Bldg 16 NCRC,2800 Plymouth Rd,SPC 2800, Ann Arbor, MI 48109 USA
[7] Univ Michigan, Comp Sci & Engn, 3749 Beyster Bldg,2260 Haward St, Ann Arbor, MI 48109 USA
来源
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION | 2023年 / 330卷 / 23期
关键词
ALGORITHM; BIAS;
D O I
10.1001/jama.2023.22295
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Importance Artificial intelligence (AI) could support clinicians when diagnosing hospitalized patients; however, systematic bias in AI models could worsen clinician diagnostic accuracy. Recent regulatory guidance has called for AI models to include explanations to mitigate errors made by models, but the effectiveness of this strategy has not been established.Objectives To evaluate the impact of systematically biased AI on clinician diagnostic accuracy and to determine if image-based AI model explanations can mitigate model errors.Design, Setting, and Participants Randomized clinical vignette survey study administered between April 2022 and January 2023 across 13 US states involving hospitalist physicians, nurse practitioners, and physician assistants.Interventions Clinicians were shown 9 clinical vignettes of patients hospitalized with acute respiratory failure, including their presenting symptoms, physical examination, laboratory results, and chest radiographs. Clinicians were then asked to determine the likelihood of pneumonia, heart failure, or chronic obstructive pulmonary disease as the underlying cause(s) of each patient's acute respiratory failure. To establish baseline diagnostic accuracy, clinicians were shown 2 vignettes without AI model input. Clinicians were then randomized to see 6 vignettes with AI model input with or without AI model explanations. Among these 6 vignettes, 3 vignettes included standard-model predictions, and 3 vignettes included systematically biased model predictions.Main Outcomes and Measures Clinician diagnostic accuracy for pneumonia, heart failure, and chronic obstructive pulmonary disease.ResultsMedian participant age was 34 years (IQR, 31-39) and 241 (57.7%) were female. Four hundred fifty-seven clinicians were randomized and completed at least 1 vignette, with 231 randomized to AI model predictions without explanations, and 226 randomized to AI model predictions with explanations. Clinicians' baseline diagnostic accuracy was 73.0% (95% CI, 68.3% to 77.8%) for the 3 diagnoses. When shown a standard AI model without explanations, clinician accuracy increased over baseline by 2.9 percentage points (95% CI, 0.5 to 5.2) and by 4.4 percentage points (95% CI, 2.0 to 6.9) when clinicians were also shown AI model explanations. Systematically biased AI model predictions decreased clinician accuracy by 11.3 percentage points (95% CI, 7.2 to 15.5) compared with baseline and providing biased AI model predictions with explanations decreased clinician accuracy by 9.1 percentage points (95% CI, 4.9 to 13.2) compared with baseline, representing a nonsignificant improvement of 2.3 percentage points (95% CI, -2.7 to 7.2) compared with the systematically biased AI model.Conclusions and Relevance Although standard AI models improve diagnostic accuracy, systematically biased AI models reduced diagnostic accuracy, and commonly used image-based AI model explanations did not mitigate this harmful effect.Trial RegistrationClinicalTrials.gov Identifier: NCT06098950
引用
收藏
页码:2275 / 2284
页数:10
相关论文
共 35 条
  • [1] [Anonymous], Blueprint for an AI Bill of Rights: making automated systems work for the American people
  • [2] [Anonymous], 2022, CLIN DECISION SUPPOR
  • [3] Bai B, 2021, Arxiv, DOI [arXiv:2006.05656, 10.48550/arXiv.2006.05656, DOI 10.48550/ARXIV.2006.05656]
  • [4] Being Trustworthy is Not Enough: How Untrustworthy Artificial Intelligence (AI) Can Deceive the End-Users and Gain Their Trust
    Banovic N.
    Yang Z.
    Ramesh A.
    Liu A.
    [J]. Proceedings of the ACM on Human-Computer Interaction, 2023, 7 (1 CSCW)
  • [5] GENDER BIAS IN THE DIAGNOSIS AND TREATMENT OF CORONARY-ARTERY DISEASE
    BEERY, TA
    [J]. HEART & LUNG, 1995, 24 (06): : 427 - 435
  • [6] Explainable Machine Learning in Deployment
    Bhatt, Umang
    Xiang, Alice
    Sharma, Shubham
    Weller, Adrian
    Taly, Ankur
    Jia, Yunhan
    Ghosh, Joydeep
    Puri, Ruchir
    Moura, Jose M. F.
    Eckersley, Peter
    [J]. FAT* '20: PROCEEDINGS OF THE 2020 CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, 2020, : 648 - 657
  • [7] Bubeck S, 2023, Arxiv, DOI [arXiv:2303.12712, DOI 10.48550/ARXIV.2303.12712]
  • [8] Bucinca Z., PREPRINT, DOI [10.48550/arXiv.2102.09692, DOI 10.48550/ARXIV.2102.09692]
  • [9] Clayton D., 1996, MARKOV CHAIN MONTE C, P275
  • [10] AI for radiographic COVID-19 detection selects shortcuts over signal
    DeGrave, Alex J.
    Janizek, Joseph D.
    Lee, Su-In
    [J]. NATURE MACHINE INTELLIGENCE, 2021, 3 (07) : 610 - 619