Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non-Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study

被引:9
|
作者
Hu, Danqing [1 ]
Li, Shaolei [2 ]
Zhang, Huanyao [1 ]
Wu, Nan [2 ]
Lu, Xudong [1 ]
机构
[1] Zhejiang Univ, Coll Biomed Engn & Instrumental Sci, 38 Zheda Rd, Hangzhou 310027, Peoples R China
[2] Peking Univ, Dept Thorac Surg 2, Canc Hosp & Inst, Beijing, Peoples R China
关键词
non-small cell lung cancer; lymph node metastasis prediction; natural language processing; electronic medical records; lung cancer; prediction models; decision making; machine learning; algorithm; forest modeling; CARCINOMA ANTIGEN; RIDGE REGRESSION; INFORMATION; SYSTEM; EXTRACTION; NOMOGRAM; DISEASE; MARKER; PET/CT; MODEL;
D O I
10.2196/35475
中图分类号
R-058 [];
学科分类号
摘要
Background: Lymph node metastasis (LNM) is critical for treatment decision making of patients with resectable non-small cell lung cancer, but it is difficult to precisely diagnose preoperatively. Electronic medical records (EMRs) contain a large volume of valuable information about LNM, but some key information is recorded in free text, which hinders its secondary use. Objective: This study aims to develop LNM prediction models based on EMRs using natural language processing (NLP) and machine learning algorithms. Methods: We developed a multiturn question answering NLP model to extract features about the primary tumor and lymph nodes from computed tomography (CT) reports. We then combined these features with other structured clinical characteristics to develop LNM prediction models using machine learning algorithms. We conducted extensive experiments to explore the effectiveness of the predictive models and compared them with size criteria based on CT image findings (the maximum short axis diameter of lymph node >10 mm was regarded as a metastatic node) and clinician's evaluation. Since the NLP model may extract features with mistakes, we also calculated the concordance correlation between the predicted probabilities of models using NLP-extracted features and gold standard features to explore the influence of NLP-driven automatic extraction. Results: Experimental results show that the random forest models achieved the best performances with 0.792 area under the receiver operating characteristic curve (AUC) value and 0.456 average precision (AP) value for pN2 LNM prediction and 0.768 AUC value and 0.524 AP value for pNl&N2 LNM prediction. And all machine learning models outperformed the size criteria and clinician's evaluation. The concordance correlation between the random forest models using NLP-extracted features and gold standard features is 0.950 and improved to 0.984 when the top 5 important NLP-extracted features were replaced with gold standard features. Conclusions: The LNM models developed can achieve competitive performance using only limited EMR data such as CT reports and tumor markers in comparison with the clinician's evaluation. The multiturn question answering NLP model can extract features effectively to support the development of LNM prediction models, which may facilitate the clinical application of predictive models.
引用
收藏
页码:153 / 170
页数:18
相关论文
共 50 条
  • [1] A Multi-Modal Heterogeneous Graph Forest to Predict Lymph Node Metastasis of Non-Small Cell Lung Cancer
    Hu, Danqing
    Li, Shaolei
    Wu, Nan
    Lu, Xudong
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (03) : 1216 - 1224
  • [2] Machine learning predictive models and risk factors for lymph node metastasis in non-small cell lung cancer
    Wu, Bo
    Zhu, Yihui
    Hu, Zhuozheng
    Wu, Jiajun
    Zhou, Weijun
    Si, Maoyan
    Cao, Xiying
    Wu, Zhicheng
    Zhang, Wenxiong
    BMC PULMONARY MEDICINE, 2024, 24 (01):
  • [3] Machine learning computational model to predict lung cancer using electronic medical records
    Levi, Matanel
    Lazebnik, Teddy
    Kushnir, Shiri
    Yosef, Noga
    Shlomi, Dekel
    CANCER EPIDEMIOLOGY, 2024, 92
  • [4] Model to Predict Small Lymph Nodes Metastasis in Non-Small Cell Lung Cancer
    Li, J.
    Su, F.
    JOURNAL OF THORACIC ONCOLOGY, 2019, 14 (10) : S499 - S499
  • [5] Development and Validation of a Clinical Prediction Model for N2 Lymph Node Metastasis in Non-Small Cell Lung Cancer
    Chen, Kezhong
    Yang, Fang
    Jiang, Guanchao
    Li, Jianfeng
    Wang, Jun
    ANNALS OF THORACIC SURGERY, 2013, 96 (05): : 1761 - 1768
  • [6] Investigation of Mediastinal Lymph Node Metastasis in Non-Small Cell Lung Cancer
    Wang, G.
    Zhang, C.
    Liu, H.
    Yu, Z.
    Liu, H.
    JOURNAL OF THORACIC ONCOLOGY, 2018, 13 (12) : S1056 - S1056
  • [7] Expression of nestin in lymph node metastasis and lymphangiogenesis in non-small cell lung cancer patients
    Chen, Zhenguang
    Wang, Tao
    Luo, Honghe
    Lai, Yingrong
    Yang, Xuhui
    Li, Fugui
    Lei, Yiyan
    Su, Chunhua
    Zhang, Xiuming
    Lahn, Bruce T.
    Xiang, Andy Peng
    HUMAN PATHOLOGY, 2010, 41 (05) : 737 - 744
  • [8] Preoperative Prediction of Lymph Node Metastasis in Patients With Early-T-Stage Non-small Cell Lung Cancer by Machine Learning Algorithms
    Wu, Yijun
    Liu, Jianghao
    Han, Chang
    Liu, Xinyu
    Chong, Yuming
    Wang, Zhile
    Gong, Liang
    Zhang, Jiaqi
    Gao, Xuehan
    Guo, Chao
    Liang, Naixin
    Li, Shanqing
    FRONTIERS IN ONCOLOGY, 2020, 10
  • [9] Ultrasound-based radiomics machine learning models for diagnosing cervical lymph node metastasis in patients with non-small cell lung cancer: a multicentre study
    Deng, Zhiqiang
    Liu, Xiaoling
    Wu, Renmei
    Yan, Haoji
    Gou, Lingyun
    Hu, Wenlong
    Wan, Jiaxin
    Song, Chenwanqiu
    Chen, Jing
    Ma, Daiyuan
    Zhou, Haining
    Tian, Dong
    BMC CANCER, 2024, 24 (01)
  • [10] Predictors and Patterns of Lymph Node Metastasis in Small Peripheral Non-Small Cell Lung Cancer
    Lin, Jun-Tao
    Yang, Xue-Ning
    Yan, Li-Xu
    Wang, Si-Yun
    Zhong, Wen-Zhao
    Nie, Qiang
    Liao, Ri-Qiang
    Dong, Song
    Jiang, Ben Yuan
    Wu, Yi Long
    JOURNAL OF THORACIC ONCOLOGY, 2017, 12 (01) : S659 - S660