Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non-Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study

被引：9

作者：

Hu, Danqing ^{[1
]}

Li, Shaolei ^{[2
]}

Zhang, Huanyao ^{[1
]}

Wu, Nan ^{[2
]}

Lu, Xudong ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Biomed Engn & Instrumental Sci, 38 Zheda Rd, Hangzhou 310027, Peoples R China

[2] Peking Univ, Dept Thorac Surg 2, Canc Hosp & Inst, Beijing, Peoples R China

来源：

JMIR MEDICAL INFORMATICS | 2022年 / 10卷 / 04期

关键词：

non-small cell lung cancer; lymph node metastasis prediction; natural language processing; electronic medical records; lung cancer; prediction models; decision making; machine learning; algorithm; forest modeling; CARCINOMA ANTIGEN; RIDGE REGRESSION; INFORMATION; SYSTEM; EXTRACTION; NOMOGRAM; DISEASE; MARKER; PET/CT; MODEL;

D O I：

10.2196/35475

中图分类号：

R-058 [];

学科分类号：

摘要：

Background: Lymph node metastasis (LNM) is critical for treatment decision making of patients with resectable non-small cell lung cancer, but it is difficult to precisely diagnose preoperatively. Electronic medical records (EMRs) contain a large volume of valuable information about LNM, but some key information is recorded in free text, which hinders its secondary use. Objective: This study aims to develop LNM prediction models based on EMRs using natural language processing (NLP) and machine learning algorithms. Methods: We developed a multiturn question answering NLP model to extract features about the primary tumor and lymph nodes from computed tomography (CT) reports. We then combined these features with other structured clinical characteristics to develop LNM prediction models using machine learning algorithms. We conducted extensive experiments to explore the effectiveness of the predictive models and compared them with size criteria based on CT image findings (the maximum short axis diameter of lymph node >10 mm was regarded as a metastatic node) and clinician's evaluation. Since the NLP model may extract features with mistakes, we also calculated the concordance correlation between the predicted probabilities of models using NLP-extracted features and gold standard features to explore the influence of NLP-driven automatic extraction. Results: Experimental results show that the random forest models achieved the best performances with 0.792 area under the receiver operating characteristic curve (AUC) value and 0.456 average precision (AP) value for pN2 LNM prediction and 0.768 AUC value and 0.524 AP value for pNl&N2 LNM prediction. And all machine learning models outperformed the size criteria and clinician's evaluation. The concordance correlation between the random forest models using NLP-extracted features and gold standard features is 0.950 and improved to 0.984 when the top 5 important NLP-extracted features were replaced with gold standard features. Conclusions: The LNM models developed can achieve competitive performance using only limited EMR data such as CT reports and tumor markers in comparison with the clinician's evaluation. The multiturn question answering NLP model can extract features effectively to support the development of LNM prediction models, which may facilitate the clinical application of predictive models.

引用

页码：153 / 170

页数：18

共 50 条

[41] Original Article Machine Learning Study of SNPs in Noncoding Regions to Predict Non-small Cell Lung Cancer Susceptibility
Huang, Y.
Bao, T.
Zhang, T.
Ji, G.
Wang, Y.
Ling, Z.
Li, W.
CLINICAL ONCOLOGY, 2023, 35 (11) : 701 - 712
[42] Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from 18F-FDG PET/CT images
Wang, Hongkai
Zhou, Zongwei
Li, Yingci
Chen, Zhonghua
Lu, Peiou
Wang, Wenzhi
Liu, Wanyu
Yu, Lijuan
EJNMMI RESEARCH, 2017, 7
[43] Development and Validation of Machine Learning Models to Predict Epidermal Growth Factor Receptor Mutation in Non-Small Cell Lung Cancer: A Multi-Center Retrospective Radiomics Study
Liu, Yafeng
Zhou, Jiawei
Wu, Jing
Wang, Wenyang
Wang, Xueqin
Guo, Jianqiang
Wang, Qingsen
Zhang, Xin
Li, Danting
Xie, Jun
Ding, Xuansheng
Xing, Yingru
Hu, Dong
CANCER CONTROL, 2022, 29
[44] Prediction of Occult Lymph Node Metastasis Using Metabolic PET Parameters in Small Size Peripheral Non-Small Cell Lung Cancer
Jung, Joonho
Park, Seong Yong
Lee, Su Jin
JOURNAL OF THORACIC ONCOLOGY, 2015, 10 (09) : S683 - S683
[45] The Relationship Between Primary Tumor Metabolic Activity and Lymph Node and Distant Organ Metastasis in Non-Small Cell Lung Cancer
Yildirim, Fatma
Turk, Murat
Akdemir, Umit Ozgur
Yurdakul, Ahmet Selim
Ozturk, Can
GAZI MEDICAL JOURNAL, 2018, 29 (03): : 164 - 168
[46] Accuracy of helical computed tomography for the identification of lymph node metastasis in resectable non-small cell lung cancer
Kazuhiro Imai
Yoshihiro Minamiya
Hajime Saito
Taku Nakagawa
Yukiko Hosono
Hiroshi Nanjo
Kasumi Tozawa
Masaji Hashimoto
Yoshihiko Kimura
Jun-Ichi Ogawa
Surgery Today, 2008, 38 : 1083 - 1090
[47] Overexpression of miR-1260b in Non-small Cell Lung Cancer is Associated with Lymph Node Metastasis
Xu, Limin
Li, Liqin
Li, Jing
Li, Hongwei
Shen, Qibin
Ping, Jinliang
Ma, Zhihong
Zhong, Jing
Dai, Licheng
AGING AND DISEASE, 2015, 6 (06): : 478 - 485
[48] Predictive risk factors for lymph node metastasis in patients with resected non-small cell lung cancer: a case control study
Moulla, Yusef
Gradistanac, Tanja
Wittekind, Christian
Eichfeld, Uwe
Gockel, Ines
Dietrich, Arne
JOURNAL OF CARDIOTHORACIC SURGERY, 2019, 14 (1)
[49] Preoperative platelet count in predicting lymph node metastasis and prognosis in patients with non-small cell lung cancer
Liu, H. B.
Gu, X. L.
Ma, X. Q.
Lv, T. F.
Wu, Y.
Xiao, Y. Y.
Yuan, D. M.
Li, Y. F.
Song, Y.
NEOPLASMA, 2013, 60 (02) : 203 - 208
[50] Radiogenomic Models Using Machine Learning Techniques to Predict EGFR Mutations in Non-Small Cell Lung Cancer
Nair, Jay Kumar Raghavan
Saeed, Umar Abid
McDougall, Connor C.
Sabri, Ali
Kovacina, Bojan
Raidu, B. V. S.
Khokhar, Riaz Ahmed
Probst, Stephan
Hirsh, Vera
Chankowsky, Jeffrey
Van Kempen, Leon C.
Taylor, Jana
CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2021, 72 (01): : 109 - 119

← 1 2 3 4 5 →