Hazards of data leakage in machine learning: A study on classification of breast cancer using deep neural networks

被引:28
作者
Samala, Ravi K. [1 ,2 ]
Chan, Heang-Ping [1 ]
Hadjiiski, Lubomir [1 ]
Koneru, Sathvik [1 ]
机构
[1] Univ Michigan, Dept Radiol, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Dept Radiol, CAD AI Res Div, Ann Arbor, MI 48109 USA
来源
MEDICAL IMAGING 2020: COMPUTER-AIDED DIAGNOSIS | 2020年 / 11314卷
基金
美国国家卫生研究院;
关键词
data leakage; feature leakage; deep-learning; convolutional neural network; transfer learning; mammography; sample size; breast cancer; TISSUE; MASS;
D O I
10.1117/12.2549313
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
With the renewed interest in developing machine learning methods for medical imaging using deep-learning approaches, it is essential to reexamine data leakage. In this study, we simulated data leakage in the form of feature leakage, where a classifier was trained on the training set, but the feature selection was influenced by the performance on the validation set. A pre-trained deep-learning convolutional neural network (DCNN) without fine-tuning was used as a feature extractor for malignant and benign mass classification in mammography. A feature selection algorithm was trained in the wrapper mode with a cost function tuned to follow the performance metric on the validation set. Linear discriminant analysis (LDA) classifier was trained to classify masses on mammographic patches Mammograms from 1,882 patient cases with 4,577 unique patches were partitioned by patient into 3,222 for training and 508 for validation, while 847 were sequestered as unseen independent test set to evaluate the generalization error. The effects of the finite sample size on data leakage were studied by varying the training and validation set sizes from 10% to 100% of the available sets. The area under the receiver operating characteristic curve (AUC) was used as the performance metric. The results show that the performance on the validation set could be overestimated, having AUCs of 0.75 to 0.99 for various sample sizes, whereas the independent test performance could realistically only reach an AUC of 0.72. The analysis indicates that deep learning can risk a high inflation in performance and proper housekeeping rules should be followed when designing and developing deep learning methods in medical imaging.
引用
收藏
页数:6
相关论文
共 9 条
[1]   COMPUTER-AIDED CLASSIFICATION OF MAMMOGRAPHIC MASSES AND NORMAL TISSUE - LINEAR DISCRIMINANT-ANALYSIS IN TEXTURE FEATURE SPACE [J].
CHAN, HP ;
WEI, DT ;
HELVIE, MA ;
SAHINER, B ;
ADLER, DD ;
GOODSITT, MM ;
PETRICK, N .
PHYSICS IN MEDICINE AND BIOLOGY, 1995, 40 (05) :857-876
[2]   Leakage in Data Mining: Formulation, Detection, and Avoidance [J].
Kaufman, Shachar ;
Rosset, Saharon ;
Perlich, Claudia ;
Stitelman, Ori .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2012, 6 (04)
[3]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[4]   Evaluation of computer-aided detection and diagnosis systems [J].
Petrick, Nicholas ;
Sahiner, Berkman ;
Armato, Samuel G., III ;
Bert, Alberto ;
Correale, Loredana ;
Delsanto, Silvia ;
Freedman, Matthew T. ;
Fryd, David ;
Gur, David ;
Hadjiiski, Lubomir ;
Huo, Zhimin ;
Jiang, Yulei ;
Morra, Lia ;
Paquerault, Sophie ;
Raykar, Vikas ;
Samuelson, Frank ;
Summers, Ronald M. ;
Tourassi, Georgia ;
Yoshida, Hiroyuki ;
Zheng, Bin ;
Zhou, Chuan ;
Chan, Heang-Ping .
MEDICAL PHYSICS, 2013, 40 (08)
[5]   Classification of mass and normal breast tissue: A convolution neural network classifier with spatial domain and texture images [J].
Sahiner, B ;
Chan, HP ;
Petrick, N ;
Wei, DT ;
Helvie, MA ;
Adler, DD ;
Goodsitt, MM .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 1996, 15 (05) :598-610
[6]  
Samala R.-K., 2018, RSNA PROGRAM BOOK
[7]   Breast Cancer Diagnosis in Digital Breast Tomosynthesis: Effects of Training Sample Size on Multi-Stage Transfer Learning Using Deep Neural Nets [J].
Samala, Ravi K. ;
Chan, Heang-Ping ;
Hadjiiski, Lubomir ;
Helvie, Mark A. ;
Richter, Caleb D. ;
Cha, Kenny H. .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2019, 38 (03) :686-696
[8]   Multi-task transfer learning deep convolutional neural network: application to computer-aided diagnosis of breast cancer on mammograms [J].
Samala, Ravi K. ;
Chan, Heang-Ping ;
Hadjiiski, Lubomir M. ;
Helvie, Mark A. ;
Cha, Kenny H. ;
Richter, Caleb D. .
PHYSICS IN MEDICINE AND BIOLOGY, 2017, 62 (23) :8894-8908
[9]   Mass detection in digital breast tomosynthesis: Deep convolutional neural network with transfer learning from mammography [J].
Samala, Ravi K. ;
Chan, Heang-Ping ;
Hadjiiski, Lubomir ;
Helvie, Mark A. ;
Wei, Jun ;
Cha, Kenny .
MEDICAL PHYSICS, 2016, 43 (12) :6654-6666