Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non-Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study

被引:9
|
作者
Hu, Danqing [1 ]
Li, Shaolei [2 ]
Zhang, Huanyao [1 ]
Wu, Nan [2 ]
Lu, Xudong [1 ]
机构
[1] Zhejiang Univ, Coll Biomed Engn & Instrumental Sci, 38 Zheda Rd, Hangzhou 310027, Peoples R China
[2] Peking Univ, Dept Thorac Surg 2, Canc Hosp & Inst, Beijing, Peoples R China
关键词
non-small cell lung cancer; lymph node metastasis prediction; natural language processing; electronic medical records; lung cancer; prediction models; decision making; machine learning; algorithm; forest modeling; CARCINOMA ANTIGEN; RIDGE REGRESSION; INFORMATION; SYSTEM; EXTRACTION; NOMOGRAM; DISEASE; MARKER; PET/CT; MODEL;
D O I
10.2196/35475
中图分类号
R-058 [];
学科分类号
摘要
Background: Lymph node metastasis (LNM) is critical for treatment decision making of patients with resectable non-small cell lung cancer, but it is difficult to precisely diagnose preoperatively. Electronic medical records (EMRs) contain a large volume of valuable information about LNM, but some key information is recorded in free text, which hinders its secondary use. Objective: This study aims to develop LNM prediction models based on EMRs using natural language processing (NLP) and machine learning algorithms. Methods: We developed a multiturn question answering NLP model to extract features about the primary tumor and lymph nodes from computed tomography (CT) reports. We then combined these features with other structured clinical characteristics to develop LNM prediction models using machine learning algorithms. We conducted extensive experiments to explore the effectiveness of the predictive models and compared them with size criteria based on CT image findings (the maximum short axis diameter of lymph node >10 mm was regarded as a metastatic node) and clinician's evaluation. Since the NLP model may extract features with mistakes, we also calculated the concordance correlation between the predicted probabilities of models using NLP-extracted features and gold standard features to explore the influence of NLP-driven automatic extraction. Results: Experimental results show that the random forest models achieved the best performances with 0.792 area under the receiver operating characteristic curve (AUC) value and 0.456 average precision (AP) value for pN2 LNM prediction and 0.768 AUC value and 0.524 AP value for pNl&N2 LNM prediction. And all machine learning models outperformed the size criteria and clinician's evaluation. The concordance correlation between the random forest models using NLP-extracted features and gold standard features is 0.950 and improved to 0.984 when the top 5 important NLP-extracted features were replaced with gold standard features. Conclusions: The LNM models developed can achieve competitive performance using only limited EMR data such as CT reports and tumor markers in comparison with the clinician's evaluation. The multiturn question answering NLP model can extract features effectively to support the development of LNM prediction models, which may facilitate the clinical application of predictive models.
引用
收藏
页码:153 / 170
页数:18
相关论文
共 50 条
  • [31] Prediction of bone metastasis in non-small cell lung cancer based on machine learning
    Li, Meng-Pan
    Liu, Wen-Cai
    Sun, Bo-Lin
    Zhong, Nan-Shan
    Liu, Zhi-Li
    Huang, Shan-Hu
    Zhang, Zhi-Hong
    Liu, Jia-Ming
    FRONTIERS IN ONCOLOGY, 2023, 12
  • [32] Independent risk factors for lymph node metastasis in 2623 patients with Non-Small cell lung cancer
    Xue, Xinying
    Zang, Xuelei
    Liu, Yuxia
    Lin, Dongliang
    Jiang, Tianjiao
    Gao, Jie
    Wu, Chongchong
    Ma, Xidong
    Deng, Hui
    Yu, Zhaofeng
    Pan, Lei
    Xue, Zhiqiang
    SURGICAL ONCOLOGY-OXFORD, 2020, 34 : 256 - 260
  • [33] Diagnostic method of mass spectrometry for detecting lymph node metastasis of non-small cell lung cancer
    Yoshimura, Ryuichi
    Shigeeda, Wataru
    Fujita, Yuji
    Kokaji, Tetsuo
    Deguchi, Hiroyuki
    Tomoyasu, Makoto
    Kudo, Satoshi
    Kaneko, Yuka
    Kanno, Hironaga
    Iwai, Hidenobu
    Mase, Tomohiko
    Saito, Hajime
    THORACIC CANCER, 2024, 15 (03) : 209 - 214
  • [34] Overexpression of EMMPRIN is associated with lymph node metastasis and advanced stage of non-small cell lung cancer: a retrospective study
    Bing Liu
    Zhaohui Wan
    Baowei Sheng
    Yong Lin
    Tian Fu
    Qingdi Zeng
    Congcong Qi
    BMC Pulmonary Medicine, 17
  • [35] Development and validation of a preoperative noninvasive predictive model based on circular tumor DNA for lymph node metastasis in resectable non-small cell lung cancer
    Zhang, Rusi
    Zhang, Xuewen
    Huang, Zirui
    Wang, Fang
    Lin, Yongbin
    Wen, Yingsheng
    Liu, Li
    Li, Jinbo
    Liu, Xinyi
    Xie, Wenzhuan
    Huang, Mengli
    Wang, Gongming
    Yang, Longjun
    Zhao, Dechang
    Yu, Xiangyang
    Xi, Kexing
    Wang, Weidong
    Cai, Ling
    Zhang, Lanjun
    TRANSLATIONAL LUNG CANCER RESEARCH, 2020, 9 (03) : 722 - +
  • [36] Missed Intrapulmonary Lymph Node Metastasis and Survival After Resection of Non-Small Cell Lung Cancer
    Smeltzer, Matthew P.
    Faris, Nicholas
    Yu, Xinhua
    Ramirez, Robert A.
    Ramirez, Laura E. M.
    Wang, Christopher G.
    Adair, Courtney
    Berry, Allen
    Osarogiagbon, Raymond U.
    ANNALS OF THORACIC SURGERY, 2016, 102 (02) : 448 - 453
  • [37] Development and validation of an MRI-Based nomogram to predict the effectiveness of immunotherapy for brain metastasis in patients with non-small cell lung cancer
    Xu, Junhao
    Wang, Peiliang
    Li, Yikun
    Shi, Xiaonan
    Yin, Tianwen
    Yu, Jinming
    Teng, Feifei
    FRONTIERS IN IMMUNOLOGY, 2024, 15
  • [38] Computed tomography characteristics of cN0 primary non-small cell lung cancer predict occult lymph node metastasis
    Yoon, Dong Woog
    Kang, Danbee
    Jeon, Yeong Jeong
    Lee, Junghee
    Shin, Sumin
    Cho, Jong Ho
    Choi, Yong Soo
    Zo, Jae Ill
    Kim, Jhingook
    Shim, Young Mog
    Cho, Juhee
    Kim, Hong Kwan
    Lee, Ho Yun
    EUROPEAN RADIOLOGY, 2024, 34 (12) : 7817 - 7828
  • [39] Usefulness of the neutrophil-to-lymphocyte ratio in predicting lymph node metastasis in patients with non-small cell lung cancer
    Huang, Chongbiao
    Yue, Jie
    Li, Zengxun
    Li, Na
    Zhao, Jinkun
    Qi, Daliang
    TUMOR BIOLOGY, 2015, 36 (10) : 7581 - 7589
  • [40] Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from 18F-FDG PET/CT images
    Hongkai Wang
    Zongwei Zhou
    Yingci Li
    Zhonghua Chen
    Peiou Lu
    Wenzhi Wang
    Wanyu Liu
    Lijuan Yu
    EJNMMI Research, 7