Decision region analysis for generalizability of artificial intelligence models: estimating model generalizability in the case of cross-reactivity and population shift

被引:2
作者
Burgon, Alexis [1 ]
Sahiner, Berkman [1 ]
Petrick, Nicholas [1 ]
Pennello, Gene [1 ]
Cha, Kenny H. [1 ]
Samala, Ravi K. [1 ]
机构
[1] US FDA, Ctr Devices & Radiol Hlth, Div Imaging Diagnostics & Software Reliabil, Off Sci & Engn Labs, Silver Spring, MD 20993 USA
基金
美国国家卫生研究院;
关键词
generalizability; decision region; represented and unrepresented subgroups; vicinal distribution; cross-reactivity; population shift;
D O I
10.1117/1.JMI.11.1.014501
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Purpose Understanding an artificial intelligence (AI) model's ability to generalize to its target population is critical to ensuring the safe and effective usage of AI in medical devices. A traditional generalizability assessment relies on the availability of large, diverse datasets, which are difficult to obtain in many medical imaging applications. We present an approach for enhanced generalizability assessment by examining the decision space beyond the available testing data distribution. Approach Vicinal distributions of virtual samples are generated by interpolating between triplets of test images. The generated virtual samples leverage the characteristics already in the test set, increasing the sample diversity while remaining close to the AI model's data manifold. We demonstrate the generalizability assessment approach on the non-clinical tasks of classifying patient sex, race, COVID status, and age group from chest x-rays. Results Decision region composition analysis for generalizability indicated that a disproportionately large portion of the decision space belonged to a single "preferred" class for each task, despite comparable performance on the evaluation dataset. Evaluation using cross-reactivity and population shift strategies indicated a tendency to overpredict samples as belonging to the preferred class (e.g., COVID negative) for patients whose subgroup was not represented in the model development data. Conclusions An analysis of an AI model's decision space has the potential to provide insight into model generalizability. Our approach uses the analysis of composition of the decision space to obtain an improved assessment of model generalizability in the case of limited test data.
引用
收藏
页数:18
相关论文
共 33 条
[21]  
Singhal K, 2021, ADV NEUR IN, V34
[22]   Decision Boundary Visualization for Counterfactual Reasoning [J].
Sohns, Jan-Tobias ;
Garth, Christoph ;
Leitte, Heike .
COMPUTER GRAPHICS FORUM, 2023, 42 (01) :7-20
[23]   Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective [J].
Somepalli, Gowthami ;
Fowl, Liam ;
Bansal, Arpit ;
Yeh-Chiang, Ping ;
Dar, Yehuda ;
Baraniuk, Richard ;
Goldblum, Micah ;
Goldstein, Tom .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13689-13698
[24]  
Sowrirajan H, 2021, Arxiv, DOI [arXiv:2010.05352, DOI 10.48550/ARXIV.2010.05352]
[25]  
U.S. Department of Health and Human Services Food and Drug Administration Center for Devices and Radiological Health, 2023, Marketing submission recommendations for a predetermined change control plan for artificial intelligence/machine learning (AI/ML)-enabled device software functions, draft guidance for industry and Food and Drug Administration staff, draft guidance
[26]  
U.S. Department of Health and Human Services Food and Drug Administration Center for Devices and Radiological Health, 2021, Artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD) action plan
[27]  
U.S. Department of Health and Human Services Food and Drug Administration Center for Devices and Radiological Health, 2022, Clinical performance assessment: considerations for computer-assisted detection devices applied to radiology images and radiology device data in premarket notification (510(k)) submissions - guidance for industry and Food and Drug Administration staff
[28]  
Vilone G, 2020, Arxiv, DOI arXiv:2006.00093
[29]  
Wang KK, 2019, Arxiv, DOI arXiv:1910.10252
[30]   Preparing Medical Imaging Data for Machine Learning [J].
Willemink, Martin J. ;
Koszek, Wojciech A. ;
Hardell, Cailin ;
Wu, Jie ;
Fleischmann, Dominik ;
Harvey, Hugh ;
Folio, Les R. ;
Summers, Ronald M. ;
Rubin, Daniel L. ;
Lungren, Matthew P. .
RADIOLOGY, 2020, 295 (01) :4-15