Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non-Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study

被引:9
|
作者
Hu, Danqing [1 ]
Li, Shaolei [2 ]
Zhang, Huanyao [1 ]
Wu, Nan [2 ]
Lu, Xudong [1 ]
机构
[1] Zhejiang Univ, Coll Biomed Engn & Instrumental Sci, 38 Zheda Rd, Hangzhou 310027, Peoples R China
[2] Peking Univ, Dept Thorac Surg 2, Canc Hosp & Inst, Beijing, Peoples R China
关键词
non-small cell lung cancer; lymph node metastasis prediction; natural language processing; electronic medical records; lung cancer; prediction models; decision making; machine learning; algorithm; forest modeling; CARCINOMA ANTIGEN; RIDGE REGRESSION; INFORMATION; SYSTEM; EXTRACTION; NOMOGRAM; DISEASE; MARKER; PET/CT; MODEL;
D O I
10.2196/35475
中图分类号
R-058 [];
学科分类号
摘要
Background: Lymph node metastasis (LNM) is critical for treatment decision making of patients with resectable non-small cell lung cancer, but it is difficult to precisely diagnose preoperatively. Electronic medical records (EMRs) contain a large volume of valuable information about LNM, but some key information is recorded in free text, which hinders its secondary use. Objective: This study aims to develop LNM prediction models based on EMRs using natural language processing (NLP) and machine learning algorithms. Methods: We developed a multiturn question answering NLP model to extract features about the primary tumor and lymph nodes from computed tomography (CT) reports. We then combined these features with other structured clinical characteristics to develop LNM prediction models using machine learning algorithms. We conducted extensive experiments to explore the effectiveness of the predictive models and compared them with size criteria based on CT image findings (the maximum short axis diameter of lymph node >10 mm was regarded as a metastatic node) and clinician's evaluation. Since the NLP model may extract features with mistakes, we also calculated the concordance correlation between the predicted probabilities of models using NLP-extracted features and gold standard features to explore the influence of NLP-driven automatic extraction. Results: Experimental results show that the random forest models achieved the best performances with 0.792 area under the receiver operating characteristic curve (AUC) value and 0.456 average precision (AP) value for pN2 LNM prediction and 0.768 AUC value and 0.524 AP value for pNl&N2 LNM prediction. And all machine learning models outperformed the size criteria and clinician's evaluation. The concordance correlation between the random forest models using NLP-extracted features and gold standard features is 0.950 and improved to 0.984 when the top 5 important NLP-extracted features were replaced with gold standard features. Conclusions: The LNM models developed can achieve competitive performance using only limited EMR data such as CT reports and tumor markers in comparison with the clinician's evaluation. The multiturn question answering NLP model can extract features effectively to support the development of LNM prediction models, which may facilitate the clinical application of predictive models.
引用
收藏
页码:153 / 170
页数:18
相关论文
共 50 条
  • [21] DEVELOPMENT AND VALIDATION OF A CLINICAL PREDICTION MODEL FOR N2 LYMPH NODE METASTASIS IN STAGE I NON-SMALL CELL LUNG CANCER
    Chen, Kezhong
    Jiang, Guanchao
    Li, Jianfeng
    Wang, Jun
    JOURNAL OF THORACIC ONCOLOGY, 2013, 8 : S631 - S631
  • [22] Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation
    Zhao, Yiqing
    Fu, Sunyang
    Bielinski, Suzette J.
    Decker, Paul A.
    Chamberlain, Alanna M.
    Roger, Veronique L.
    Liu, Hongfang
    Larson, Nicholas B.
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (03)
  • [23] Development and validation of a nomogram model for predicting lymph node metastasis in early non-small-cell lung cancer
    Xie, Hao
    Wang, Chao
    Ma, Lin
    Zhang, Qiang
    AMERICAN JOURNAL OF CANCER RESEARCH, 2025, 15 (01): : 299 - 310
  • [24] Apolipoprotein E is a predictive marker for assessing non-small cell lung cancer patients with lymph node metastasis
    An, Hyo Jung
    Koh, Hyun Min
    Song, Dae Hyun
    PATHOLOGY RESEARCH AND PRACTICE, 2019, 215 (10)
  • [25] New P16 Expression Criteria Predict Lymph Node Metastasis in Patients With Non-small Cell Lung Cancer
    An, Hyo Jung
    Koh, Hyun Min
    Song, Dae Hyun
    IN VIVO, 2019, 33 (06): : 1885 - 1892
  • [26] The prognostic impact of lymph node metastasis in patients with non-small cell lung cancer and distant organ metastasis
    Yang, Jie
    Peng, Aimei
    Wang, Bo
    Gusdon, Aaron M.
    Sun, Xiaoting
    Jiang, Gening
    Zhang, Peng
    CLINICAL & EXPERIMENTAL METASTASIS, 2019, 36 (05) : 457 - 466
  • [27] The prognostic impact of lymph node metastasis in patients with non-small cell lung cancer and distant organ metastasis
    Jie Yang
    Aimei Peng
    Bo Wang
    Aaron M. Gusdon
    Xiaoting Sun
    Gening Jiang
    Peng Zhang
    Clinical & Experimental Metastasis, 2019, 36 : 457 - 466
  • [28] Novel nomograms to predict lymph node metastasis and distant metastasis in resected patients with early-stage non-small cell lung cancer
    Tian, Yi
    He, Yu
    Li, Xin
    Liu, Xiaowen
    ANNALS OF PALLIATIVE MEDICINE, 2021, 10 (03) : 2548 - +
  • [29] Dual-Region Computed Tomography Radiomics-Based Machine Learning Predicts Subcarinal Lymph Node Metastasis in Patients with Non-small Cell Lung Cancer
    Yan, Hao-Ji
    Zhao, Jia-Sheng
    Zuo, Hou-Dong
    Zhang, Jun-Jie
    Deng, Zhi-Qiang
    Yang, Chen
    Luo, Xi
    Wan, Jia-Xin
    Zheng, Xiang-Yun
    Chen, Wei-Yang
    Li, Su-Ping
    Tian, Dong
    ANNALS OF SURGICAL ONCOLOGY, 2024, 31 (08) : 5011 - 5020
  • [30] Assessment of non-lobe-specific lymph node metastasis in clinical stage IA non-small cell lung cancer
    Zhang, Zhirong
    Miao, Jinbai
    Chen, Qirui
    Fu, Yili
    Li, Hui
    Hu, Bin
    THORACIC CANCER, 2019, 10 (07) : 1597 - 1604