Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning

被引：7

作者：

Park, Hyung Jun ^{[1
,7
]}

Park, Namu ^{[2
]}

Lee, Jang Ho ^{[1
]}

Choi, Myeong Geun ^{[3
]}

Ryu, Jin-Sook ^{[4
]}

Song, Min ^{[5
]}

Choi, Chang-Min ^{[1
,6
]}

机构：

[1] Univ Ulsan, Coll Med, Asan Med Ctr, Dept Pulm & Crit Care Med, 88,Olymp Ro 43 Gil, Seoul 05505, South Korea

[2] Univ Washington, Sch Med, Dept Biomed Informat & Med Educ, Seattle, WA USA

[3] Ewha Womans Univ, Mokdong Hosp, Coll Med, Div Pulm & Crit Care Med,Dept Internal Med, Seoul, South Korea

[4] Univ Ulsan, Asan Med Ctr, Coll Med, Dept Nucl Med, Seoul, South Korea

[5] Yonsei Univ, Dept Digital Analyt, 50 Yonsei Ro, Seoul 03722, South Korea

[6] Univ Ulsan, Asan Med Ctr, Coll Med, Dept Oncol, Seoul, South Korea

[7] Univ Ulsan, Asan Med Ctr, Coll Med, Dept Informat Med, Seoul, South Korea

来源：

BMC MEDICAL INFORMATICS AND DECISION MAKING | 2022年 / 22卷 / 01期

关键词：

Natural language processing; Auto-annotation; Deep learning; Lung cancer; Pseudo-labelling;

D O I：

10.1186/s12911-022-01975-7

中图分类号：

R-058 [];

学科分类号：

摘要：

Background Extracting metastatic information from previous radiologic-text reports is important, however, laborious annotations have limited the usability of these texts. We developed a deep-learning model for extracting primary lung cancer sites and metastatic lymph nodes and distant metastasis information from PET-CT reports for determining lung cancer stages. Methods PET-CT reports, fully written in English, were acquired from two cohorts of patients with lung cancer who were diagnosed at a tertiary hospital between January 2004 and March 2020. One cohort of 20,466 PET-CT reports was used for training and the validation set, and the other cohort of 4190 PET-CT reports was used for an additional-test set. A pre-processing model (Lung Cancer Spell Checker) was applied to correct the typographical errors, and pseudo-labelling was used for training the model. The deep-learning model was constructed using the Convolutional-Recurrent Neural Network. The performance metrics for the prediction model were accuracy, precision, sensitivity, micro-AUROC, and AUPRC. Results For the extraction of primary lung cancer location, the model showed a micro-AUROC of 0.913 and 0.946 in the validation set and the additional-test set, respectively. For metastatic lymph nodes, the model showed a sensitivity of 0.827 and a specificity of 0.960. In predicting distant metastasis, the model showed a micro-AUROC of 0.944 and 0.950 in the validation and the additional-test set, respectively. Conclusion Our deep-learning method could be used for extracting lung cancer stage information from PET-CT reports and may facilitate lung cancer studies by alleviating laborious annotation by clinicians.

引用

页数：11

共 41 条

[1] Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning
Hyung Jun Park
Namu Park
Jang Ho Lee
Myeong Geun Choi
Jin-Sook Ryu
Min Song
Chang-Min Choi
BMC Medical Informatics and Decision Making, 22
[2] Automatic Lung Cancer Staging from Medical Reports Using Natural Language Processing
Sui, X.
Liu, T.
Huang, Q.
Hou, Y.
Wang, Y.
Kang, G.
Guo, H.
Li, N.
Li, Y.
Wang, Z.
Wang, J.
JOURNAL OF THORACIC ONCOLOGY, 2018, 13 (10) : S772 - S772
[3] Automatic Extraction of Lung Cancer Staging Information From Computed Tomography Reports: Deep Learning Approach
Hu, Danqing
Zhang, Huanyao
Li, Shaolei
Wang, Yuhong
Wu, Nan
Lu, Xudong
JMIR MEDICAL INFORMATICS, 2021, 9 (07)
[4] Discerning Tumor Status from Unstructured MRI Reports—Completeness of Information in Existing Reports and Utility of Automated Natural Language Processing
Lionel T. E. Cheng
Jiaping Zheng
Guergana K. Savova
Bradley J. Erickson
Journal of Digital Imaging, 2010, 23 : 119 - 132
[5] Image Text Extraction and Natural Language Processing of Unstructured Data from Medical Reports
Malashin, Ivan
Masich, Igor
Tynchenko, Vadim
Gantimurov, Andrei
Nelyub, Vladimir
Borodulin, Aleksei
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (02): : 1361 - 1377
[6] Natural language processing in urology: Automated extraction of clinical information from histopathology reports of uro-oncology procedures
Huang, Honghong
Lim, Fiona Xin Yi
Gu, Gary Tianyu
Han, Matthew Jiangchou
Fang, Andrew Hao Sen
Chia, Elian Hui San
Bei, Eileen Yen Tze
Tham, Sarah Zhuling
Ho, Henry Sun Sien
Yuen, John Shyi Peng
Sun, Aixin
Lim, Jay Kheng Sit
HELIYON, 2023, 9 (04)
[7] Discerning Tumor Status from Unstructured MRI Reports-Completeness of Information in Existing Reports and Utility of Automated Natural Language Processing
Cheng, Lionel T. E.
Zheng, Jiaping
Savova, Guergana K.
Erickson, Bradley J.
JOURNAL OF DIGITAL IMAGING, 2010, 23 (02) : 119 - 132
[8] Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports
Qiu, John X.
Yoon, Hong-Jun
Fearn, Paul A.
Tourassi, Georgia D.
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2018, 22 (01) : 244 - 251
[9] Natural Language Processing Algorithm Used for Staging Pulmonary Oncology from Free-Text Radiological Reports: "Including PET-CT and Validation Towards Clinical Use"
Nobel, J. Martijn
Puts, Sander
Krdzalic, Jasenko
Zegers, Karen M. L.
Lobbes, Marc B. I.
Robben, Simon G. F.
Dekker, Andre L. A. J.
JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2024, 37 (01): : 3 - 12
[10] Uncertainty-aware automatic TNM staging classification for [18F] Fluorodeoxyglucose PET-CT reports for lung cancer utilising transformer-based language models and multi-task learning
Barlow, Stephen H.
Chicklore, Sugama
He, Yulan
Ourselin, Sebastien
Wagner, Thomas
Barnes, Anna
Cook, Gary J. R.
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)

← 1 2 3 4 5 →