Automation Bias in Mammography: The Impact of Artificial Intelligence BI-RADS Suggestions on Reader Performance

被引:96
作者
Dratsch, Thomas [1 ]
Chen, Xue [1 ]
Mehrizi, Mohammad Rezazade [2 ]
Kloeckner, Roman [3 ]
Maehringer-Kunz, Aline [4 ,5 ]
Puesken, Michael [1 ]
Baessler, Bettina [6 ]
Sauer, Stephanie [6 ]
Maintz, David [1 ]
Santos, Daniel Pinto dos [1 ]
机构
[1] Univ Cologne, Univ Hosp Cologne, Inst Diagnost & Intervent Radiol, Fac Med, Kerpener Str 62, D-50937 Cologne, Germany
[2] Vrije Univ Amsterdam, Sch Business & Econ, Knowledge Informat & Innovat, Amsterdam, Netherlands
[3] Univ Clin Schleswig Holstein, Inst Intervent Radiol, Kiel, Germany
[4] Johannes Gutenberg Univ Mainz, Univ Med Ctr, Dept Diagnost & Intervent Radiol, Mainz, Germany
[5] Univ Clin Wurzburg, Inst Diagnost & Intervent Radiol, Wurzburg, Germany
[6] Univ Clin Wurzburg, Inst Diagnost & Intervent Radiol, Wurzburg, Germany
关键词
SCREENING MAMMOGRAPHY; TRUST; AI;
D O I
10.1148/radiol.222176
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Background: Automation bias (the propensity for humans to favor suggestions from automated decision-making systems) is a known source of error in human-machine interactions, but its implications regarding artificial intelligence (AI)-aided mammography reading are unknown. Purpose: To determine how automation bias can affect inexperienced, moderately experienced, and very experienced radiologists when reading mammograms with the aid of an artificial intelligence (AI) system.Materials and Methods: In this prospective experiment, 27 radiologists read 50 mammograms and provided their Breast Imaging Reporting and Data System (BI-RADS) assessment assisted by a purported AI system. Mammograms were obtained between January 2017 and December 2019 and were presented in two randomized sets. The first was a training set of 10 mammograms, with the correct BI-RADS category suggested by the AI system. The second was a set of 40 mammograms in which an incorrect BI-RADS category was suggested for 12 mammograms. Reader performance, degree of bias in BI-RADS scoring, perceived accuracy of the AI system, and reader confidence in their own BI-RADS ratings were assessed using analysis of variance (ANOVA) and repeated-measures ANOVA followed by post hoc tests and Kruskal-Wallis tests followed by the Dunn post hoc test.Results: The percentage of correctly rated mammograms by inexperienced (mean, 79.7% & PLUSMN; 11.7 [SD] vs 19.8% & PLUSMN; 14.0; P < .001; r = 0.93), moderately experienced (mean, 81.3% & PLUSMN; 10.1 vs 24.8% & PLUSMN; 11.6; P < .001; r = 0.96), and very experienced (mean, 82.3% & PLUSMN; 4.2 vs 45.5% & PLUSMN; 9.1; P = .003; r = 0.97) radiologists was significantly impacted by the correctness of the AI prediction of BI-RADS category. Inexperienced radiologists were significantly more likely to follow the suggestions of the purported AI when it incorrectly suggested a higher BI-RADS category than the actual ground truth compared with both moderately (mean degree of bias, 4.0 & PLUSMN; 1.8 vs 2.4 & PLUSMN; 1.5; P = .044; r = 0.46) and very (mean degree of bias, 4.0 & PLUSMN; 1.8 vs 1.2 & PLUSMN; 0.8; P = .009; r = 0.65) experienced readers.Conclusion: The results show that inexperienced, moderately experienced, and very experienced radiologists reading mammograms are prone to automation bias when being supported by an AI-based system. This and other effects of human and machine interaction must be considered to ensure safe deployment and accurate diagnostic performance when combining human readers and AI.
引用
收藏
页数:9
相关论文
共 19 条
[1]   The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database [J].
Benjamens, Stan ;
Dhunnoo, Pranavsingh ;
Mesko, Bertalan .
NPJ DIGITAL MEDICINE, 2020, 3 (01)
[2]   Unintended Consequences of Machine Learning in Medicine [J].
Cabitza, Federico ;
Rasoini, Raffaele ;
Gensini, Gian Franco .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2017, 318 (06) :517-518
[3]   CAD and AI for breast cancer-recent development and challenges [J].
Chan, Heang-Ping ;
Samala, Ravi K. ;
Hadjiiski, Lubomir M. .
BRITISH JOURNAL OF RADIOLOGY, 2020, 93 (1108)
[4]   A POWER PRIMER [J].
COHEN, J .
PSYCHOLOGICAL BULLETIN, 1992, 112 (01) :155-159
[5]   The role of trust in automation reliance [J].
Dzindolet, MT ;
Peterson, SA ;
Pomranky, RA ;
Pierce, LG ;
Beck, HP .
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2003, 58 (06) :697-718
[6]   Faster and Better: How Anomaly Detection Can Accelerate and Improve Reporting of Head Computed Tomography [J].
Finck, Tom ;
Moosbauer, Julia ;
Probst, Monika ;
Schlaeger, Sarah ;
Schuberth, Madeleine ;
Schinz, David ;
Yigitsoy, Mehmet ;
Byas, Sebastian ;
Zimmer, Claus ;
Pfister, Franz ;
Wiestler, Benedikt .
DIAGNOSTICS, 2022, 12 (02)
[7]   Do as AI say: susceptibility in deployment of clinical decision-aids [J].
Gaube, Susanne ;
Suresh, Harini ;
Raue, Martina ;
Merritt, Alexander ;
Berkowitz, Seth J. ;
Lermer, Eva ;
Coughlin, Joseph F. ;
Guttag, John V. ;
Colak, Errol ;
Ghassemi, Marzyeh .
NPJ DIGITAL MEDICINE, 2021, 4 (01)
[8]   Automation bias: a systematic review of frequency, effect mediators, and mitigators [J].
Goddard, Kate ;
Roudsari, Abdul ;
Wyatt, Jeremy C. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (01) :121-127
[9]   A Warning about Warning Signals for Interpreting Mammograms [J].
Hofvind, Solveig ;
Lee, Christoph, I .
RADIOLOGY, 2022, 302 (02) :284-285
[10]  
HOLM S, 1979, SCAND J STAT, V6, P65