Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non-Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study

被引:9
|
作者
Hu, Danqing [1 ]
Li, Shaolei [2 ]
Zhang, Huanyao [1 ]
Wu, Nan [2 ]
Lu, Xudong [1 ]
机构
[1] Zhejiang Univ, Coll Biomed Engn & Instrumental Sci, 38 Zheda Rd, Hangzhou 310027, Peoples R China
[2] Peking Univ, Dept Thorac Surg 2, Canc Hosp & Inst, Beijing, Peoples R China
关键词
non-small cell lung cancer; lymph node metastasis prediction; natural language processing; electronic medical records; lung cancer; prediction models; decision making; machine learning; algorithm; forest modeling; CARCINOMA ANTIGEN; RIDGE REGRESSION; INFORMATION; SYSTEM; EXTRACTION; NOMOGRAM; DISEASE; MARKER; PET/CT; MODEL;
D O I
10.2196/35475
中图分类号
R-058 [];
学科分类号
摘要
Background: Lymph node metastasis (LNM) is critical for treatment decision making of patients with resectable non-small cell lung cancer, but it is difficult to precisely diagnose preoperatively. Electronic medical records (EMRs) contain a large volume of valuable information about LNM, but some key information is recorded in free text, which hinders its secondary use. Objective: This study aims to develop LNM prediction models based on EMRs using natural language processing (NLP) and machine learning algorithms. Methods: We developed a multiturn question answering NLP model to extract features about the primary tumor and lymph nodes from computed tomography (CT) reports. We then combined these features with other structured clinical characteristics to develop LNM prediction models using machine learning algorithms. We conducted extensive experiments to explore the effectiveness of the predictive models and compared them with size criteria based on CT image findings (the maximum short axis diameter of lymph node >10 mm was regarded as a metastatic node) and clinician's evaluation. Since the NLP model may extract features with mistakes, we also calculated the concordance correlation between the predicted probabilities of models using NLP-extracted features and gold standard features to explore the influence of NLP-driven automatic extraction. Results: Experimental results show that the random forest models achieved the best performances with 0.792 area under the receiver operating characteristic curve (AUC) value and 0.456 average precision (AP) value for pN2 LNM prediction and 0.768 AUC value and 0.524 AP value for pNl&N2 LNM prediction. And all machine learning models outperformed the size criteria and clinician's evaluation. The concordance correlation between the random forest models using NLP-extracted features and gold standard features is 0.950 and improved to 0.984 when the top 5 important NLP-extracted features were replaced with gold standard features. Conclusions: The LNM models developed can achieve competitive performance using only limited EMR data such as CT reports and tumor markers in comparison with the clinician's evaluation. The multiturn question answering NLP model can extract features effectively to support the development of LNM prediction models, which may facilitate the clinical application of predictive models.
引用
收藏
页码:153 / 170
页数:18
相关论文
共 50 条
  • [41] Original Article Machine Learning Study of SNPs in Noncoding Regions to Predict Non-small Cell Lung Cancer Susceptibility
    Huang, Y.
    Bao, T.
    Zhang, T.
    Ji, G.
    Wang, Y.
    Ling, Z.
    Li, W.
    CLINICAL ONCOLOGY, 2023, 35 (11) : 701 - 712
  • [42] Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from 18F-FDG PET/CT images
    Wang, Hongkai
    Zhou, Zongwei
    Li, Yingci
    Chen, Zhonghua
    Lu, Peiou
    Wang, Wenzhi
    Liu, Wanyu
    Yu, Lijuan
    EJNMMI RESEARCH, 2017, 7
  • [43] Development and Validation of Machine Learning Models to Predict Epidermal Growth Factor Receptor Mutation in Non-Small Cell Lung Cancer: A Multi-Center Retrospective Radiomics Study
    Liu, Yafeng
    Zhou, Jiawei
    Wu, Jing
    Wang, Wenyang
    Wang, Xueqin
    Guo, Jianqiang
    Wang, Qingsen
    Zhang, Xin
    Li, Danting
    Xie, Jun
    Ding, Xuansheng
    Xing, Yingru
    Hu, Dong
    CANCER CONTROL, 2022, 29
  • [44] Prediction of Occult Lymph Node Metastasis Using Metabolic PET Parameters in Small Size Peripheral Non-Small Cell Lung Cancer
    Jung, Joonho
    Park, Seong Yong
    Lee, Su Jin
    JOURNAL OF THORACIC ONCOLOGY, 2015, 10 (09) : S683 - S683
  • [45] The Relationship Between Primary Tumor Metabolic Activity and Lymph Node and Distant Organ Metastasis in Non-Small Cell Lung Cancer
    Yildirim, Fatma
    Turk, Murat
    Akdemir, Umit Ozgur
    Yurdakul, Ahmet Selim
    Ozturk, Can
    GAZI MEDICAL JOURNAL, 2018, 29 (03): : 164 - 168
  • [46] Accuracy of helical computed tomography for the identification of lymph node metastasis in resectable non-small cell lung cancer
    Kazuhiro Imai
    Yoshihiro Minamiya
    Hajime Saito
    Taku Nakagawa
    Yukiko Hosono
    Hiroshi Nanjo
    Kasumi Tozawa
    Masaji Hashimoto
    Yoshihiko Kimura
    Jun-Ichi Ogawa
    Surgery Today, 2008, 38 : 1083 - 1090
  • [47] Overexpression of miR-1260b in Non-small Cell Lung Cancer is Associated with Lymph Node Metastasis
    Xu, Limin
    Li, Liqin
    Li, Jing
    Li, Hongwei
    Shen, Qibin
    Ping, Jinliang
    Ma, Zhihong
    Zhong, Jing
    Dai, Licheng
    AGING AND DISEASE, 2015, 6 (06): : 478 - 485
  • [48] Predictive risk factors for lymph node metastasis in patients with resected non-small cell lung cancer: a case control study
    Moulla, Yusef
    Gradistanac, Tanja
    Wittekind, Christian
    Eichfeld, Uwe
    Gockel, Ines
    Dietrich, Arne
    JOURNAL OF CARDIOTHORACIC SURGERY, 2019, 14 (1)
  • [49] Preoperative platelet count in predicting lymph node metastasis and prognosis in patients with non-small cell lung cancer
    Liu, H. B.
    Gu, X. L.
    Ma, X. Q.
    Lv, T. F.
    Wu, Y.
    Xiao, Y. Y.
    Yuan, D. M.
    Li, Y. F.
    Song, Y.
    NEOPLASMA, 2013, 60 (02) : 203 - 208
  • [50] Radiogenomic Models Using Machine Learning Techniques to Predict EGFR Mutations in Non-Small Cell Lung Cancer
    Nair, Jay Kumar Raghavan
    Saeed, Umar Abid
    McDougall, Connor C.
    Sabri, Ali
    Kovacina, Bojan
    Raidu, B. V. S.
    Khokhar, Riaz Ahmed
    Probst, Stephan
    Hirsh, Vera
    Chankowsky, Jeffrey
    Van Kempen, Leon C.
    Taylor, Jana
    CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2021, 72 (01): : 109 - 119