Role of sureness in evaluating AI/CADx: Lesion-based repeatability of machine learning classification performance on breast MRI

被引:2
|
作者
Whitney, Heather M. [1 ,3 ]
Drukker, Karen [1 ]
Vieceli, Michael [2 ,4 ,5 ]
Van Dusen, Amy [2 ]
de Oliveira, Michelle [2 ]
Abe, Hiroyuki [1 ]
Giger, Maryellen L. [1 ]
机构
[1] Univ Chicago, Dept Radiol, Chicago, IL USA
[2] Wheaton Coll, Dept Phys, Wheaton, IL USA
[3] Univ Chicago, Dept Radiol, 5481 South Maryland Ave, Chicago, IL 60637 USA
[4] Univ Florida, Div Med Phys, Gainesville, FL USA
[5] Univ Texas Hlth Sci Ctr San Antonio, Dept Med Phys, San Antonio, TX USA
关键词
AI; breast; computer-aided diagnosis; machine learning; magnetic resonance imaging; repeatability; RADIOMIC FEATURES; TEXTURE ANALYSIS; CLASSIFIERS; DIAGNOSIS; CURVES; CANCER; IMAGES;
D O I
10.1002/mp.16673
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
BackgroundArtificial intelligence/computer-aided diagnosis (AI/CADx) and its use of radiomics have shown potential in diagnosis and prognosis of breast cancer. Performance metrics such as the area under the receiver operating characteristic (ROC) curve (AUC) are frequently used as figures of merit for the evaluation of CADx. Methods for evaluating lesion-based measures of performance may enhance the assessment of AI/CADx pipelines, particularly in the situation of comparing performances by classifier. PurposeThe purpose of this study was to investigate the use case of two standard classifiers to (1) compare overall classification performance of the classifiers in the task of distinguishing between benign and malignant breast lesions using radiomic features extracted from dynamic contrast-enhanced magnetic resonance (DCE-MR) images, (2) define a new repeatability metric (termed sureness), and (3) use sureness to examine if one classifier provides an advantage in AI diagnostic performance by lesion when using radiomic features. MethodsImages of 1052 breast lesions (201 benign, 851 cancers) had been retrospectively collected under HIPAA/IRB compliance. The lesions had been segmented automatically using a fuzzy c-means method and thirty-two radiomic features had been extracted. Classification was investigated for the task of malignant lesions (81% of the dataset) versus benign lesions (19%). Two classifiers (linear discriminant analysis, LDA and support vector machines, SVM) were trained and tested within 0.632 bootstrap analyses (2000 iterations). Whole-set classification performance was evaluated at two levels: (1) the 0.632+ bias-corrected area under the ROC curve (AUC) and (2) performance metric curves which give variability in operating sensitivity and specificity at a target operating point (95% target sensitivity). Sureness was defined as 1-95% confidence interval of the classifier output for each lesion for each classifier. Lesion-based repeatability was evaluated at two levels: (1) repeatability profiles, which represent the distribution of sureness across the decision threshold and (2) sureness of each lesion. The latter was used to identify lesions with better sureness with one classifier over another while maintaining lesion-based performance across the bootstrap iterations. ResultsIn classification performance assessment, the median and 95% CI of difference in AUC between the two classifiers did not show evidence of difference (& UDelta;AUC = -0.003 [-0.031, 0.018]). Both classifiers achieved the target sensitivity. Sureness was more consistent across the classifier output range for the SVM classifier than the LDA classifier. The SVM resulted in a net gain of 33 benign lesions and 307 cancers with higher sureness and maintained lesion-based performance. However, with the LDA there was a notable percentage of benign lesions (42%) with better sureness but lower lesion-based performance. ConclusionsWhen there is no evidence for difference in performance between classifiers using AUC or other performance summary measures, a lesion-based sureness metric may provide additional insight into AI pipeline design. These findings present and emphasize the utility of lesion-based repeatability via sureness in AI/CADx as a complementary enhancement to other evaluation measures.
引用
收藏
页码:1812 / 1821
页数:10
相关论文
共 14 条
  • [1] Case-based repeatability of machine learning classification performance on breast MRI
    Vieceli, Michael
    Van Dusen, Amy
    Drukker, Karen
    Abe, Hiroyuki
    Giger, Maryellen L.
    Whitney, Heather M.
    MEDICAL IMAGING 2020: COMPUTER-AIDED DIAGNOSIS, 2020, 11314
  • [2] Breast Lesion Classification with Multiparametric Breast MRI Using Radiomics and Machine Learning: A Comparison with Radiologists' Performance
    Naranjo, Isaac Daimiel
    Gibbs, Peter
    Reiner, Jeffrey S.
    Lo Gullo, Roberto
    Thakur, Sunitha B.
    Jochelson, Maxine S.
    Thakur, Nikita
    Baltzer, Pascal A. T.
    Helbich, Thomas H.
    Pinker, Katja
    CANCERS, 2022, 14 (07)
  • [3] Case-based repeatability and operating point variability of AI: breast lesion classification based on deep transfer learning
    Whitney, Heather M.
    Drukker, Karen
    Abe, Hiroyuki
    Giger, Maryellen L.
    MEDICAL IMAGING 2022: IMAGE PERCEPTION, OBSERVER PERFORMANCE, AND TECHNOLOGY ASSESSMENT, 2022, 12035
  • [4] Machine learning for multi-parametric breast MRI: radiomics-based approaches for lesion classification
    Altabella, Luisa
    Benetti, Giulio
    Camera, Lucia
    Cardano, Giuseppe
    Montemezzi, Stefania
    Cavedon, Carlo
    PHYSICS IN MEDICINE AND BIOLOGY, 2022, 67 (15)
  • [5] Classification of pulmonary lesion based on multiparametric MRI: utility of radiomics and comparison of machine learning methods
    Wang, Xinhui
    Wan, Qi
    Chen, Houjin
    Li, Yanfeng
    Li, Xinchun
    EUROPEAN RADIOLOGY, 2020, 30 (08) : 4595 - 4605
  • [6] Classification of pulmonary lesion based on multiparametric MRI: utility of radiomics and comparison of machine learning methods
    Xinhui Wang
    Qi Wan
    Houjin Chen
    Yanfeng Li
    Xinchun Li
    European Radiology, 2020, 30 : 4595 - 4605
  • [7] Using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance
    Kerschke, Laura
    Weigel, Stefanie
    Rodriguez-Ruiz, Alejandro
    Karssemeijer, Nico
    Heindel, Walter
    EUROPEAN RADIOLOGY, 2022, 32 (02) : 842 - 852
  • [8] Diffusion-weighted MRI radiomics of spine bone tumors: feature stability and machine learning-based classification performance
    Gitto, Salvatore
    Bologna, Marco
    Corino, Valentina D. A.
    Emili, Ilaria
    Albano, Domenico
    Messina, Carmelo
    Armiraglio, Elisabetta
    Parafioriti, Antonina
    Luzzati, Alessandro
    Mainardi, Luca
    Sconfienza, Luca Maria
    RADIOLOGIA MEDICA, 2022, 127 (05): : 518 - 525
  • [9] Diagnostic performance of MRI-based radiomics models using machine learning approaches for the triple classification of parotid tumors
    Guo, Junjie
    Feng, Jiajun
    Huang, Yuqian
    Li, Xianqing
    Hu, Zhenbin
    Zhou, Quan
    Xu, Honggang
    HELIYON, 2024, 10 (17)
  • [10] Diffusion-weighted MRI radiomics of spine bone tumors: feature stability and machine learning-based classification performance
    Salvatore Gitto
    Marco Bologna
    Valentina D. A. Corino
    Ilaria Emili
    Domenico Albano
    Carmelo Messina
    Elisabetta Armiraglio
    Antonina Parafioriti
    Alessandro Luzzati
    Luca Mainardi
    Luca Maria Sconfienza
    La radiologia medica, 2022, 127 : 518 - 525