Towards a guideline for evaluation metrics in medical image segmentation

被引:172
作者
Mueller, Dominik [1 ,2 ]
Soto-Rey, Inaki [2 ]
Kramer, Frank [1 ]
机构
[1] Univ Augsburg, IT Infrastruct Translat Med Res, Augsburg, Germany
[2] Univ Hosp Augsburg, Inst Digital Med, Med Data Integrat Ctr, Augsburg, Germany
关键词
Biomedical image segmentation; Semantic segmentation; Medical Image Analysis; Reproducibility; Evaluation; Guideline; Performance assessment;
D O I
10.1186/s13104-022-06096-y
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In the last decade, research on artificial intelligence has seen rapid growth with deep learning models, especially in the field of medical image segmentation. Various studies demonstrated that these models have powerful prediction capabilities and achieved similar results as clinicians. However, recent studies revealed that the evaluation in image segmentation studies lacks reliable model performance assessment and showed statistical bias by incorrect metric implementation or usage. Thus, this work provides an overview and interpretation guide on the following metrics for medical image segmentation evaluation in binary as well as multi-class problems: Dice similarity coefficient, Jaccard, Sensitivity, Specificity, Rand index, ROC curves, Cohen's Kappa, and Hausdorff distance. Furthermore, common issues like class imbalance and statistical as well as interpretation biases in evaluation are discussed. As a summary, we propose a guideline for standardized medical image segmentation evaluation to improve evaluation quality, reproducibility, and comparability in the research field.
引用
收藏
页数:8
相关论文
共 46 条
  • [1] Abraham N, 2019, I S BIOMED IMAGING, P683, DOI 10.1109/ISBI.2019.8759329
  • [2] Dataset of breast ultrasound images
    Al-Dhabyani, Walid
    Gomaa, Mohammed
    Khaled, Hussien
    Fahmy, Aly
    [J]. DATA IN BRIEF, 2020, 28
  • [3] Going Deep in Medical Image Analysis: Concepts, Methods, Challenges, and Future Directions
    Altaf, Fouzia
    Islam, Syed M. S.
    Akhtar, Naveed
    Janjua, Naeem Khalid
    [J]. IEEE ACCESS, 2019, 7 : 99540 - 99572
  • [4] [Anonymous], COH KAPP WHAT IT IS
  • [5] Deep semantic segmentation of natural and medical images: a review
    Asgari Taghanaki, Saeid
    Abhishek, Kumar
    Cohen, Joseph Paul
    Cohen-Adad, Julien
    Hamarneh, Ghassan
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (01) : 137 - 178
  • [6] On the usage of average Hausdorff distance for segmentation performance assessment: hidden error when used for ranking
    Aydin, Orhun Utku
    Taha, Abdel Aziz
    Hilbert, Adam
    Khalil, Ahmed A.
    Galinovic, Ivana
    Fiebach, Jochen B.
    Frey, Dietmar
    Madai, Vince Istvan
    [J]. EUROPEAN RADIOLOGY EXPERIMENTAL, 2021, 5 (01)
  • [7] WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians
    Bernal, Jorge
    Javier Sanchez, F.
    Fernandez-Esparrach, Gloria
    Gil, Debora
    Rodriguez, Cristina
    Vilarino, Fernando
    [J]. COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2015, 43 : 99 - 111
  • [8] Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl
    Caicedo, Juan C.
    Goodman, Allen
    Karhohs, Kyle W.
    Cimini, Beth A.
    Ackerman, Jeanelle
    Haghighi, Marzieh
    Heng, CherKeng
    Becker, Tim
    Minh Doan
    McQuin, Claire
    Rohban, Mohammad
    Singh, Shantanu
    Carpenter, Anne E.
    [J]. NATURE METHODS, 2019, 16 (12) : 1247 - +
  • [9] Potentials of AI in medical image analysis in Gastroenterology and Hepatology
    Chen, Hao
    Sung, Joseph J. Y.
    [J]. JOURNAL OF GASTROENTEROLOGY AND HEPATOLOGY, 2021, 36 (01) : 31 - 38
  • [10] A TagSNP in SIRT1 Gene Confers Susceptibility to Myocardial Infarction in a Chinese Han Population
    Cheng, Jie
    Cho, Miook
    Cen, Jin-ming
    Cai, Meng-yun
    Xu, Shun
    Ma, Ze-wei
    Liu, Xinguang
    Yang, Xi-li
    Chen, Can
    Suh, Yousin
    Xiong, Xing-dong
    [J]. PLOS ONE, 2015, 10 (02):