Development and Validation of Novel Deep-Learning Models Using Multiple Data Types for Lung Cancer Survival

被引:8
作者
Hsu, Jason C. [1 ,2 ,3 ,4 ]
Phung-Anh Nguyen [1 ,2 ,3 ]
Phan Thanh Phuc [4 ]
Tsai-Chih Lo [5 ]
Min-Huei Hsu [6 ,7 ]
Min-Shu Hsieh [8 ,9 ]
Nguyen Quoc Khanh Le [10 ,11 ]
Chi-Tsun Cheng [3 ]
Tzu-Hao Chang [2 ,5 ]
Cheng-Yu Chen [11 ,12 ]
机构
[1] Taipei Med Univ, Clin Data Ctr, Off Data Sci, Taipei 110, Taiwan
[2] Taipei Med Univ, Taipei Med Univ Hosp, Clin Big Data Res Ctr, Taipei 110, Taiwan
[3] Taipei Med Univ, Coll Management, Res Ctr Hlth Care Ind Data Sci, Taipei 110, Taiwan
[4] Taipei Med Univ, Coll Management, Int PhD Program Biotech & Healthcare Management, Taipei 110, Taiwan
[5] Taipei Med Univ, Coll Med Sci & Technol, Grad Inst Biomed Informat, 250 Wu Hsing Str, Taipei 110, Taiwan
[6] Taipei Med Univ, Off Data Sci, Taipei 110, Taiwan
[7] Taipei Med Univ, Coll Management, Grad Inst Data Sci, Taipei 110, Taiwan
[8] Natl Taiwan Univ Hosp, Dept Pathol, Taipei 100, Taiwan
[9] Natl Taiwan Univ, Coll Med, Grad Inst Pathol, Taipei 100, Taiwan
[10] Taipei Med Univ, Coll Med, Profess Master Program Artificial Intelligence Me, Taipei 110, Taiwan
[11] Taipei Med Univ, Res Ctr Artificial Intelligence Med, Taipei 110, Taiwan
[12] Taipei Med Univ, Coll Med, Dept Radiol, 250 Wu Hsing Str, Taipei 110, Taiwan
关键词
lung cancer; survival; prediction models; real-world data; artificial intelligence; machine learning; BODY-MASS INDEX; NEUTROPHILS; PREDICTION; OUTCOMES;
D O I
10.3390/cancers14225562
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Simple Summary Previous survival-prediction studies have had several limitations, such as a lack of comprehensive clinical data types, testing only in limited machine-learning algorithms, or a lack of a sufficient external testing set. This lung-cancer-survival-prediction model is based on multiple data types, multiple novel machine-learning algorithms, and external testing. This predicted model demonstrated a higher performance (ANN, AUC, 0.89; accuracy, 0.82; precision, 0.91) than previous similar studies. A well-established lung-cancer-survival-prediction model that relies on multiple data types, multiple novel machine-learning algorithms, and external testing is absent in the literature. This study aims to address this gap and determine the critical factors of lung cancer survival. We selected non-small-cell lung cancer patients from a retrospective dataset of the Taipei Medical University Clinical Research Database and Taiwan Cancer Registry between January 2008 and December 2018. All patients were monitored from the index date of cancer diagnosis until the event of death. Variables, including demographics, comorbidities, medications, laboratories, and patient gene tests, were used. Nine machine-learning algorithms with various modes were used. The performance of the algorithms was measured by the area under the receiver operating characteristic curve (AUC). In total, 3714 patients were included. The best performance of the artificial neural network (ANN) model was achieved when integrating all variables with the AUC, accuracy, precision, recall, and F1-score of 0.89, 0.82, 0.91, 0.75, and 0.65, respectively. The most important features were cancer stage, cancer size, age of diagnosis, smoking, drinking status, EGFR gene, and body mass index. Overall, the ANN model improved predictive performance when integrating different data types.
引用
收藏
页数:14
相关论文
共 37 条
  • [1] Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research
    Agatonovic-Kustrin, S
    Beresford, R
    [J]. JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL ANALYSIS, 2000, 22 (05) : 717 - 727
  • [2] Bartholomai JA, 2018, IEEE INT SYMP SIGNAL, P632, DOI 10.1109/ISSPIT.2018.8642753
  • [3] Chen T., R Package Version, Patent No. [0.4-22015114, 0422015114]
  • [4] Platelets in cancer development and diagnosis
    Contursi, Annalisa
    Grande, Rosalia
    Dovizio, Melania
    Bruno, Annalisa
    Fullone, Rosa
    Patrignani, Paola
    [J]. BIOCHEMICAL SOCIETY TRANSACTIONS, 2018, 46 : 1517 - 1527
  • [5] A deep learning-based framework for lung cancer survival analysis with biomarker interpretation
    Cui, Lei
    Li, Hansheng
    Hui, Wenli
    Chen, Sitong
    Yang, Lin
    Kang, Yuxin
    Bo, Qirong
    Feng, Jun
    [J]. BMC BIOINFORMATICS, 2020, 21 (01)
  • [6] Prospective role and immunotherapeutic targets of sideroflexin protein family in lung adenocarcinoma: evidence from bioinformatics validation
    Dang, Huy Hoang
    Ta, Hoang Dang Khoa
    Nguyen, Truc T. T.
    Anuraga, Gangga
    Wang, Chih-Yang
    Lee, Kuen-Haur
    Nguyen Quoc Khanh Le
    [J]. FUNCTIONAL & INTEGRATIVE GENOMICS, 2022, 22 (05) : 1057 - 1072
  • [7] Logistic regression and artificial neural network classification models: a methodology review
    Dreiseitl, S
    Ohno-Machado, L
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2002, 35 (5-6) : 352 - 359
  • [8] Greedy function approximation: A gradient boosting machine
    Friedman, JH
    [J]. ANNALS OF STATISTICS, 2001, 29 (05) : 1189 - 1232
  • [9] Factors associated with early lung cancer mortality: a systematic review
    Goussault, Helene
    Gendarme, Sebastien
    Assie, Jean Baptiste
    Bylicki, Olivier
    Chouaid, Christos
    [J]. EXPERT REVIEW OF ANTICANCER THERAPY, 2021, 21 (10) : 1125 - 1133
  • [10] Gunn S.R., 1998, ISIS TECH REP, V14, P5