Lung cancer survival period prediction and understanding: Deep learning approaches

被引:65
作者
Doppalapudi, Shreyesh [1 ]
Qiu, Robin G. [1 ]
Badr, Youakim [1 ]
机构
[1] Penn State Univ, Div Engn & Informat Sci, Big Data Lab, Malvern, PA 19355 USA
关键词
Deep learning; Lung cancer; Survival period prediction; SEER cancer registry; Feature importance; CLASSIFICATION; REGRESSION; FEATURES;
D O I
10.1016/j.ijmedinf.2020.104371
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Introduction: Survival period prediction through early diagnosis of cancer has many benefits. It allows both patients and caregivers to plan resources, time and intensity of care to provide the best possible treatment path for the patients. In this paper, by focusing on lung cancer patients, we build several survival prediction models using deep learning techniques to tackle both cancer survival classification and regression problems. We also conduct feature importance analysis to understand how lung cancer patients' relevant factors impact their survival periods. We contribute to identifying an approach to estimate survivability that are commonly and practically appropriate for medical use. Methodologies: We have compared the performance across three of the most popular deep learning architectures Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN) while comparing the performing of deep learning models against traditional machine learning models. The data was obtained from the lung cancer section of Surveillance, Epidemiology, and End Results (SEER) cancer registry. Results: The deep learning models outperformed traditional machine learning models across both classification and regression approaches. We obtained a best of 71.18 % accuracy for the classification approach when patients' survival periods are segmented into classes of '<=6 months',' 0.5 - 2 years' and '>2 years' and Root Mean Squared Error (RMSE) of 13.5 % and R-2 value of 0.5 for the regression approach for the deep learning models while the traditional machine learning models saturated at 61.12 % classification accuracy and 14.87 % RMSE in regression. Conclusions: This approach can be a baseline for early prediction with predictions that can be further improved with more temporal treatment information collected from treated patients. In addition, we evaluated the feature importance to investigate the model interpretability, gaining further insight into the survival analysis models and the factors that are important in cancer survival period prediction.
引用
收藏
页数:12
相关论文
共 34 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Agrawal A., 2011, System Sciences (HICSS), 2011 44th Hawaii International Conference on, P1, DOI 10.1145/2003351.2003356
[3]  
American Cancer Society (ACS), 2020, KEY STAT LUNG CANC
[4]  
[Anonymous], 2006, GUIDE NUMPY
[5]  
[Anonymous], 2011, Advances in Neural Information Processing Systems
[6]   On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation [J].
Bach, Sebastian ;
Binder, Alexander ;
Montavon, Gregoire ;
Klauschen, Frederick ;
Mueller, Klaus-Robert ;
Samek, Wojciech .
PLOS ONE, 2015, 10 (07)
[7]  
Bartholomai JA, 2018, IEEE INT SYMP SIGNAL, P632, DOI 10.1109/ISSPIT.2018.8642753
[8]   A Hierarchical Bayesian Model for Personalized Survival Predictions [J].
Bellot, Alexis ;
van der Schaar, Mihaela .
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (01) :72-80
[9]  
Caruana Rich, 2019, ARXIV PREPRINT ARXIV
[10]  
Chollet F., 2015, Keras