Uncertainty-aware segmentation quality prediction via deep learning Bayesian Modeling: Comprehensive evaluation and interpretation on skin cancer and liver segmentation

被引:1
作者
Sikha, O. K. [1 ]
Riera-Marin, Meritxell [2 ]
Galdran, Adrian [1 ]
Lopez, Javier Garcia [2 ]
Rodriguez-Comas, Julia [2 ]
Piella, Gemma [1 ]
Ballester, Miguel A. Gonzalez [1 ,3 ]
机构
[1] Univ Pompeu Fabra, Dept Engn, BCN MedTech, Barcelona, Spain
[2] Scai Technol SL, Sci & Tech Dept, Barcelona, Spain
[3] ICREA, Barcelona, Spain
关键词
Image segmentation; Ground-truth free performance evaluation; Uncertainty quantification; Uncertainty aggregate score; Explainable AI; CLASSIFICATION;
D O I
10.1016/j.compmedimag.2025.102547
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Image segmentation is a critical step in computational biomedical image analysis, typically evaluated using metrics like the Dice coefficient during training and validation. However, in clinical settings without manual annotations, assessing segmentation quality becomes challenging, and models lacking reliability indicators face adoption barriers. To address this gap, we propose a novel framework for predicting segmentation quality without requiring ground truth annotations during test time. Our approach introduces two complementary frameworks: one leveraging predicted segmentation and uncertainty maps, and another integrating the original input image, uncertainty maps, and predicted segmentation maps. We present Bayesian adaptations of two benchmark segmentation models-SwinUNet and Feature Pyramid Network with ResNet50-using Monte Carlo Dropout, Ensemble, and Test Time Augmentation to quantify uncertainty. We evaluate four uncertainty estimates-confidence map, entropy, mutual information, and expected pairwise Kullback-Leibler divergence- on 2D skin lesion and 3D liver segmentation datasets, analyzing their correlation with segmentation quality metrics. Our framework achieves an R2 score of 93.25 and Pearson correlation of 96.58 on the HAM10000 dataset, outperforming previous segmentation quality assessment methods. For 3D liver segmentation, Test Time Augmentation with entropy achieves an R2 score of 85.03 and a Pearson correlation of 65.02, demonstrating cross-modality robustness. Additionally, we propose an aggregation strategy that combines multiple uncertainty estimates into a single score per image, offering amore robust and comprehensive assessment of segmentation quality compared to evaluating each measure independently. The proposed uncertainty-aware segmentation quality prediction network is interpreted using gradient-based methods such as Grad-CAM and feature embedding analysis through UMAP. These techniques provide insights into the model's behavior and reliability, helping to assess the impact of incorporating uncertainty into the segmentation quality prediction pipeline. The code is available at: https://github.com/sikha2552/Uncertainty-Aware-Segmentation-QualityPrediction-Bayesian-Modeling-with-Comprehensive-Evaluation-.
引用
收藏
页数:12
相关论文
共 37 条
[1]   A review of uncertainty quantification in deep learning: Techniques, applications and challenges [J].
Abdar, Moloud ;
Pourpanah, Farhad ;
Hussain, Sadiq ;
Rezazadegan, Dana ;
Liu, Li ;
Ghavamzadeh, Mohammad ;
Fieguth, Paul ;
Cao, Xiaochun ;
Khosravi, Abbas ;
Acharya, U. Rajendra ;
Makarenkov, Vladimir ;
Nahavandi, Saeid .
INFORMATION FUSION, 2021, 76 :243-297
[2]   Uncertainty Estimation via Stochastic Batch Normalization [J].
Atanov, Andrei ;
Ashukha, Arsenii ;
Molchanov, Dmitry ;
Neklyudov, Kirill ;
Vetrov, Dmitry .
ADVANCES IN NEURAL NETWORKS - ISNN 2019, PT I, 2019, 11554 :261-269
[3]  
Bilic P, 2022, Arxiv, DOI arXiv:1901.04056
[4]  
Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[5]   Evaluation of cell segmentation methods without reference segmentations [J].
Chen, Haoran ;
Murphy, Robert F. .
MOLECULAR BIOLOGY OF THE CELL, 2023, 34 (06)
[6]  
Codella N, 2019, Arxiv, DOI [arXiv:1902.03368, 10.48550/arXiv.1902.03368]
[7]  
DeVries T, 2018, Arxiv, DOI [arXiv:1807.00502, DOI 10.48550/ARXIV.1807.00502]
[8]  
DeVries T, 2018, Arxiv, DOI arXiv:1802.04865
[9]   Dermatologist-level classification of skin cancer with deep neural networks [J].
Esteva, Andre ;
Kuprel, Brett ;
Novoa, Roberto A. ;
Ko, Justin ;
Swetter, Susan M. ;
Blau, Helen M. ;
Thrun, Sebastian .
NATURE, 2017, 542 (7639) :115-+
[10]  
Gal Y, 2016, PR MACH LEARN RES, V48