Role of sureness in evaluating AI/CADx: Lesion-based repeatability of machine learning classification performance on breast MRI

被引：2

作者：

Whitney, Heather M. ^{[1
,3
]}

Drukker, Karen ^{[1
]}

Vieceli, Michael ^{[2
,4
,5
]}

Van Dusen, Amy ^{[2
]}

de Oliveira, Michelle ^{[2
]}

Abe, Hiroyuki ^{[1
]}

Giger, Maryellen L. ^{[1
]}

机构：

[1] Univ Chicago, Dept Radiol, Chicago, IL USA

[2] Wheaton Coll, Dept Phys, Wheaton, IL USA

[3] Univ Chicago, Dept Radiol, 5481 South Maryland Ave, Chicago, IL 60637 USA

[4] Univ Florida, Div Med Phys, Gainesville, FL USA

[5] Univ Texas Hlth Sci Ctr San Antonio, Dept Med Phys, San Antonio, TX USA

来源：

MEDICAL PHYSICS | 2024年 / 51卷 / 03期

关键词：

AI; breast; computer-aided diagnosis; machine learning; magnetic resonance imaging; repeatability; RADIOMIC FEATURES; TEXTURE ANALYSIS; CLASSIFIERS; DIAGNOSIS; CURVES; CANCER; IMAGES;

D O I：

10.1002/mp.16673

中图分类号：

R8 [特种医学]; R445 [影像诊断学];

学科分类号：

1002 ; 100207 ; 1009 ;

摘要：

BackgroundArtificial intelligence/computer-aided diagnosis (AI/CADx) and its use of radiomics have shown potential in diagnosis and prognosis of breast cancer. Performance metrics such as the area under the receiver operating characteristic (ROC) curve (AUC) are frequently used as figures of merit for the evaluation of CADx. Methods for evaluating lesion-based measures of performance may enhance the assessment of AI/CADx pipelines, particularly in the situation of comparing performances by classifier. PurposeThe purpose of this study was to investigate the use case of two standard classifiers to (1) compare overall classification performance of the classifiers in the task of distinguishing between benign and malignant breast lesions using radiomic features extracted from dynamic contrast-enhanced magnetic resonance (DCE-MR) images, (2) define a new repeatability metric (termed sureness), and (3) use sureness to examine if one classifier provides an advantage in AI diagnostic performance by lesion when using radiomic features. MethodsImages of 1052 breast lesions (201 benign, 851 cancers) had been retrospectively collected under HIPAA/IRB compliance. The lesions had been segmented automatically using a fuzzy c-means method and thirty-two radiomic features had been extracted. Classification was investigated for the task of malignant lesions (81% of the dataset) versus benign lesions (19%). Two classifiers (linear discriminant analysis, LDA and support vector machines, SVM) were trained and tested within 0.632 bootstrap analyses (2000 iterations). Whole-set classification performance was evaluated at two levels: (1) the 0.632+ bias-corrected area under the ROC curve (AUC) and (2) performance metric curves which give variability in operating sensitivity and specificity at a target operating point (95% target sensitivity). Sureness was defined as 1-95% confidence interval of the classifier output for each lesion for each classifier. Lesion-based repeatability was evaluated at two levels: (1) repeatability profiles, which represent the distribution of sureness across the decision threshold and (2) sureness of each lesion. The latter was used to identify lesions with better sureness with one classifier over another while maintaining lesion-based performance across the bootstrap iterations. ResultsIn classification performance assessment, the median and 95% CI of difference in AUC between the two classifiers did not show evidence of difference (& UDelta;AUC = -0.003 [-0.031, 0.018]). Both classifiers achieved the target sensitivity. Sureness was more consistent across the classifier output range for the SVM classifier than the LDA classifier. The SVM resulted in a net gain of 33 benign lesions and 307 cancers with higher sureness and maintained lesion-based performance. However, with the LDA there was a notable percentage of benign lesions (42%) with better sureness but lower lesion-based performance. ConclusionsWhen there is no evidence for difference in performance between classifiers using AUC or other performance summary measures, a lesion-based sureness metric may provide additional insight into AI pipeline design. These findings present and emphasize the utility of lesion-based repeatability via sureness in AI/CADx as a complementary enhancement to other evaluation measures.

引用

页码：1812 / 1821

页数：10

共 14 条

[1] Case-based repeatability of machine learning classification performance on breast MRI
Vieceli, Michael
Van Dusen, Amy
Drukker, Karen
Abe, Hiroyuki
Giger, Maryellen L.
Whitney, Heather M.
MEDICAL IMAGING 2020: COMPUTER-AIDED DIAGNOSIS, 2020, 11314
[2] Breast Lesion Classification with Multiparametric Breast MRI Using Radiomics and Machine Learning: A Comparison with Radiologists' Performance
Naranjo, Isaac Daimiel
Gibbs, Peter
Reiner, Jeffrey S.
Lo Gullo, Roberto
Thakur, Sunitha B.
Jochelson, Maxine S.
Thakur, Nikita
Baltzer, Pascal A. T.
Helbich, Thomas H.
Pinker, Katja
CANCERS, 2022, 14 (07)
[3] Case-based repeatability and operating point variability of AI: breast lesion classification based on deep transfer learning
Whitney, Heather M.
Drukker, Karen
Abe, Hiroyuki
Giger, Maryellen L.
MEDICAL IMAGING 2022: IMAGE PERCEPTION, OBSERVER PERFORMANCE, AND TECHNOLOGY ASSESSMENT, 2022, 12035
[4] Machine learning for multi-parametric breast MRI: radiomics-based approaches for lesion classification
Altabella, Luisa
Benetti, Giulio
Camera, Lucia
Cardano, Giuseppe
Montemezzi, Stefania
Cavedon, Carlo
PHYSICS IN MEDICINE AND BIOLOGY, 2022, 67 (15)
[5] Classification of pulmonary lesion based on multiparametric MRI: utility of radiomics and comparison of machine learning methods
Wang, Xinhui
Wan, Qi
Chen, Houjin
Li, Yanfeng
Li, Xinchun
EUROPEAN RADIOLOGY, 2020, 30 (08) : 4595 - 4605
[6] Classification of pulmonary lesion based on multiparametric MRI: utility of radiomics and comparison of machine learning methods
Xinhui Wang
Qi Wan
Houjin Chen
Yanfeng Li
Xinchun Li
European Radiology, 2020, 30 : 4595 - 4605
[7] Using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance
Kerschke, Laura
Weigel, Stefanie
Rodriguez-Ruiz, Alejandro
Karssemeijer, Nico
Heindel, Walter
EUROPEAN RADIOLOGY, 2022, 32 (02) : 842 - 852
[8] Diffusion-weighted MRI radiomics of spine bone tumors: feature stability and machine learning-based classification performance
Gitto, Salvatore
Bologna, Marco
Corino, Valentina D. A.
Emili, Ilaria
Albano, Domenico
Messina, Carmelo
Armiraglio, Elisabetta
Parafioriti, Antonina
Luzzati, Alessandro
Mainardi, Luca
Sconfienza, Luca Maria
RADIOLOGIA MEDICA, 2022, 127 (05): : 518 - 525
[9] Diagnostic performance of MRI-based radiomics models using machine learning approaches for the triple classification of parotid tumors
Guo, Junjie
Feng, Jiajun
Huang, Yuqian
Li, Xianqing
Hu, Zhenbin
Zhou, Quan
Xu, Honggang
HELIYON, 2024, 10 (17)
[10] Diffusion-weighted MRI radiomics of spine bone tumors: feature stability and machine learning-based classification performance
Salvatore Gitto
Marco Bologna
Valentina D. A. Corino
Ilaria Emili
Domenico Albano
Carmelo Messina
Elisabetta Armiraglio
Antonina Parafioriti
Alessandro Luzzati
Luca Mainardi
Luca Maria Sconfienza
La radiologia medica, 2022, 127 : 518 - 525

← 1 2 →