Predicting lung cancer survival based on clinical data using machine learning: A review

被引:25
作者
Altuhaifa, Fatimah Abdulazim [1 ,2 ]
Win, Khin Than [1 ]
Su, Guoxin [1 ]
机构
[1] Univ Wollongong, Sch Comp & Informat Technol, Wollongong, NSW 2500, Australia
[2] Saudi Arabia Minist Higher Educ, Riyadh, Saudi Arabia
关键词
Data mining; Machine learning; Artificial intelligence; Lung cancer; Survival prediction; Feature selection; LOGISTIC-REGRESSION; ADENOCARCINOMA; IMPUTATION; ADVANTAGES; PROGNOSIS; MODEL;
D O I
10.1016/j.compbiomed.2023.107338
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning has gained popularity in predicting survival time in the medical field. This review examines studies utilizing machine learning and data-mining techniques to predict lung cancer survival using clinical data. A systematic literature review searched MEDLINE, Scopus, and Google Scholar databases, following reporting guidelines and using the COVIDENCE system. Studies published from 2000 to 2023 employing machine learning for lung cancer survival prediction were included. Risk of bias assessment used the prediction model risk of bias assessment tool. Thirty studies were reviewed, with 13 (43.3%) using the surveillance, epidemiology, and end results database. Missing data handling was addressed in 12 (40%) studies, primarily through data transformation and conversion. Feature selection algorithms were used in 19 (63.3%) studies, with age, sex, and N stage being the most chosen features. Random forest was the predominant machine learning model, used in 17 (56.6%) studies. While the number of lung cancer survival prediction studies is limited, the use of machine learning models based on clinical data has grown since 2012. Consideration of diverse patient cohorts and data pre-processing are crucial. Notably, most studies did not account for missing data, normalization, scaling, or standardized data, potentially introducing bias. Therefore, a comprehensive study on lung cancer survival prediction using clinical data is needed, addressing these challenges.
引用
收藏
页数:16
相关论文
共 72 条
[1]  
Agrawal A, 2012, SCI PROGRAMMING-NETH, V20, P29, DOI [10.1155/2012/920245, 10.3233/SPR-2012-0335]
[2]   The linear random forest algorithm and its advantages in machine learning assisted logging regression modeling [J].
Ao, Yile ;
Li, Hongqi ;
Zhu, Liping ;
Ali, Sikandar ;
Yang, Zhongguo .
JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2019, 174 :776-789
[3]  
Bartholomai JA, 2018, IEEE INT SYMP SIGNAL, P632, DOI 10.1109/ISSPIT.2018.8642753
[4]   POINTS OF SIGNIFICANCE Statistics versus machine learning [J].
Bzdok, Danilo ;
Altman, Naomi ;
Krzywinski, Martin .
NATURE METHODS, 2018, 15 (04) :232-233
[5]   A scoping review of complication prediction models in spinal surgery: An analysis of model development, validation and impact [J].
Canturk, Toros C. ;
Czikk, Daniel ;
Wai, Eugene K. ;
Phan, Philippe ;
Stratton, Alexandra ;
Michalowski, Wojtek ;
Kingwell, Stephen .
NORTH AMERICAN SPINE SOCIETY JOURNAL, 2022, 11
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]   Data mining: An overview from a database perspective [J].
Chen, MS ;
Han, JW ;
Yu, PS .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) :866-883
[8]  
covidence, Veritas Health Innovation
[9]   Prediction of Two Year Survival Among Patients of Non-small Cell Lung Cancer [J].
Dagli, Yash ;
Choksi, Saumya ;
Roy, Sudipta .
COMPUTER AIDED INTERVENTION AND DIAGNOSTICS IN CLINICAL AND MEDICAL IMAGES, 2019, 31 :169-177
[10]   Review: A gentle introduction to imputation of missing values [J].
Donders, A. Rogier T. ;
van der Heijden, Geert J. M. G. ;
Stijnen, Theo ;
Moons, Karel G. M. .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2006, 59 (10) :1087-1091