Reproducibility of Training Deep Learning Models for Medical Image Analysis

被引:0
作者
Bosma, Joeran Sander [1 ]
Peeters, Dre [1 ]
Alves, Natalia [1 ]
Saha, Anindo [1 ]
Saghir, Zaigham [2 ]
Jacobs, Colin [1 ]
Huisman, Henkjan [1 ]
机构
[1] Radboud Univ Nijmegen, Ctr Med, Diagnost Image Anal Grp, Dept Med Imaging, NL-6525 GA Nijmegen, Netherlands
[2] Herlev Gentofte Hosp, Sect Pulm Med, Dept Med, Hellerup, Denmark
来源
MEDICAL IMAGING WITH DEEP LEARNING, VOL 227 | 2023年 / 227卷
关键词
Deep learning; reproducibility; medical image analysis; performance variance;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Performance of deep learning algorithms varies due to their development data and training method, but also due to several stochastic processes during training. Due to these random factors, a single training run may not accurately reflect the performance of a given training method. Statistical comparisons in literature between different deep learning training methods typically ignore this performance variation between training runs and incorrectly claim significance of changes in training method. We hypothesize that the impact of such performance variation is substantial, such that it may invalidate biomedical competition leaderboards and some scientific papers. To test this, we investigate the reproducibility of training deep learning algorithms for medical image analysis. We repeated training runs from prior scientific studies: three diagnostic tasks (pancreatic cancer detection in CT, clinically significant prostate cancer detection in MRI, and lung nodule malignancy risk estimation in low-dose CT) and two organ segmentation tasks (pancreas segmentation in CT and prostate segmentation in MRI). A previously published top-performing algorithm for each task was trained multiple times to determine the variance in model performance. For all three diagnostic algorithms, performance variation from retraining was significant compared to data variance. Statistically comparing independently trained algorithms from the same training method using the same dataset should follow the null hypothesis, but we observed claimed significance with a p-value below 0.05 in 15% of comparisons with conventional testing (paired bootstrapping). We conclude that variance in model performance due to retraining is substantial and should be accounted for.
引用
收藏
页码:1269 / 1287
页数:19
相关论文
共 50 条
  • [31] A comprehensive survey on deep active learning in medical image analysis
    Wang, Haoran
    Jin, Qiuye
    Li, Shiman
    Liu, Siyu
    Wang, Manning
    Song, Zhijian
    MEDICAL IMAGE ANALYSIS, 2024, 95
  • [32] A Survey on Adversarial Deep Learning Robustness in Medical Image Analysis
    Apostolidis, Kyriakos D.
    Papakostas, George A.
    ELECTRONICS, 2021, 10 (17)
  • [33] A survey on deep learning in medical image analysis
    Litjens, Geert
    Kooi, Thijs
    Bejnordi, Babak Ehteshami
    Setio, Arnaud Arindra Adiyoso
    Ciompi, Francesco
    Ghafoorian, Mohsen
    van der Laak, Jeroen A. W. M.
    van Ginneken, Bram
    Sanchez, Clara I.
    MEDICAL IMAGE ANALYSIS, 2017, 42 : 60 - 88
  • [34] Medical Image Analysis Using Deep Learning: A Systematic Literature Review
    Kumar, E. Sudheer
    Bindu, C. Shoba
    EMERGING TECHNOLOGIES IN COMPUTER ENGINEERING: MICROSERVICES IN BIG DATA ANALYTICS, 2019, 985 : 81 - 97
  • [35] EVALUATION OF COMPLEXITY MEASURES FOR DEEP LEARNING GENERALIZATION IN MEDICAL IMAGE ANALYSIS
    Vakanski, Aleksandar
    Xian, Min
    2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
  • [36] Digital Watermarking as an Adversarial Attack on Medical Image Analysis with Deep Learning
    Apostolidis, Kyriakos D.
    Papakostas, George A.
    JOURNAL OF IMAGING, 2022, 8 (06)
  • [37] A survey on active learning and human-in-the-loop deep learning for medical image analysis
    Budd, Samuel
    Robinson, Emma C.
    Kainz, Bernhard
    MEDICAL IMAGE ANALYSIS, 2021, 71
  • [38] A Tour of Unsupervised Deep Learning for Medical Image Analysis
    Raza, Khalid
    Singh, Nripendra Kumar
    CURRENT MEDICAL IMAGING, 2021, 17 (09) : 1059 - 1077
  • [39] Medical image analysis based on deep learning approach
    Muralikrishna Puttagunta
    S. Ravi
    Multimedia Tools and Applications, 2021, 80 : 24365 - 24398
  • [40] Medical image analysis using deep learning algorithms
    Li, Mengfang
    Jiang, Yuanyuan
    Zhang, Yanzhou
    Zhu, Haisheng
    FRONTIERS IN PUBLIC HEALTH, 2023, 11