Reproducibility of Training Deep Learning Models for Medical Image Analysis

被引：0

作者：

Bosma, Joeran Sander ^{[1
]}

Peeters, Dre ^{[1
]}

Alves, Natalia ^{[1
]}

Saha, Anindo ^{[1
]}

Saghir, Zaigham ^{[2
]}

Jacobs, Colin ^{[1
]}

Huisman, Henkjan ^{[1
]}

机构：

[1] Radboud Univ Nijmegen, Ctr Med, Diagnost Image Anal Grp, Dept Med Imaging, NL-6525 GA Nijmegen, Netherlands

[2] Herlev Gentofte Hosp, Sect Pulm Med, Dept Med, Hellerup, Denmark

来源：

MEDICAL IMAGING WITH DEEP LEARNING, VOL 227 | 2023年 / 227卷

关键词：

Deep learning; reproducibility; medical image analysis; performance variance;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Performance of deep learning algorithms varies due to their development data and training method, but also due to several stochastic processes during training. Due to these random factors, a single training run may not accurately reflect the performance of a given training method. Statistical comparisons in literature between different deep learning training methods typically ignore this performance variation between training runs and incorrectly claim significance of changes in training method. We hypothesize that the impact of such performance variation is substantial, such that it may invalidate biomedical competition leaderboards and some scientific papers. To test this, we investigate the reproducibility of training deep learning algorithms for medical image analysis. We repeated training runs from prior scientific studies: three diagnostic tasks (pancreatic cancer detection in CT, clinically significant prostate cancer detection in MRI, and lung nodule malignancy risk estimation in low-dose CT) and two organ segmentation tasks (pancreas segmentation in CT and prostate segmentation in MRI). A previously published top-performing algorithm for each task was trained multiple times to determine the variance in model performance. For all three diagnostic algorithms, performance variation from retraining was significant compared to data variance. Statistically comparing independently trained algorithms from the same training method using the same dataset should follow the null hypothesis, but we observed claimed significance with a p-value below 0.05 in 15% of comparisons with conventional testing (paired bootstrapping). We conclude that variance in model performance due to retraining is substantial and should be accounted for.

引用

页码：1269 / 1287

页数：19

共 50 条

[41] Medical image analysis based on deep learning approach
Puttagunta, Muralikrishna
Ravi, S.
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (16) : 24365 - 24398
[42] Shallow and deep learning classifiers in medical image analysis
Francesco Prinzi
Tiziana Currieri
Salvatore Gaglio
Salvatore Vitabile
European Radiology Experimental, 8
[43] Advances in Deep Learning Techniques for Medical Image Analysis
Niyaz, Usma
Sambyal, Abhishek Singh
Devanand
2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), 2018, : 271 - 277
[44] A Tour of Unsupervised Deep Learning for Medical Image Analysis
Raza, Khalid
Singh, Nripendra Kumar
CURRENT MEDICAL IMAGING, 2021, 17 (09) : 1059 - 1077
[45] Medical image analysis using deep learning algorithms
Li, Mengfang
Jiang, Yuanyuan
Zhang, Yanzhou
Zhu, Haisheng
FRONTIERS IN PUBLIC HEALTH, 2023, 11
[46] Shallow and deep learning classifiers in medical image analysis
Prinzi, Francesco
Currieri, Tiziana
Gaglio, Salvatore
Vitabile, Salvatore
EUROPEAN RADIOLOGY EXPERIMENTAL, 2024, 8 (01)
[47] A Machine Learning Application for Medical Image Analysis Using Deep Convolutional Neural Networks (CNNs) and Transfer Learning Models for Pneumonia Detection
Shirwaikar, Rudresh
Anitha, V.
Rao, Vuda Sreenivasa
Kaushal, Ashish Kumar
Kakad, Shital
Khan, Mohammad Ahmar
JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (05) : 2316 - 2324
[48] Exploration of the Influence on Training Deep Learning Models by Watermarked Image Dataset
Liu, Shiqin
Feng, Shiyuan
Wu, Jinxia
Ren, Wei
Wang, Weiqi
Zheng, Wenwen
19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 421 - 428
[49] Towards Training Reproducible Deep Learning Models
Chen, Boyuan
Wen, Mingzhi
Shi, Yong
Lin, Dayi
Rajbahadur, Gopi Krishnan
Jiang, Zhen Ming
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2202 - 2214
[50] On the Effective Transfer Learning Strategy for Medical Image Analysis in Deep Learning
Wen, Yang
Chen, Leiting
Zhou, Chuan
Deng, Yu
Zeng, Huiru
Xi, Shuo
Guo, Rui
2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 827 - 834

← 1 2 3 4 5 →