Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning

被引:7
|
作者
Park, Hyung Jun [1 ,7 ]
Park, Namu [2 ]
Lee, Jang Ho [1 ]
Choi, Myeong Geun [3 ]
Ryu, Jin-Sook [4 ]
Song, Min [5 ]
Choi, Chang-Min [1 ,6 ]
机构
[1] Univ Ulsan, Coll Med, Asan Med Ctr, Dept Pulm & Crit Care Med, 88,Olymp Ro 43 Gil, Seoul 05505, South Korea
[2] Univ Washington, Sch Med, Dept Biomed Informat & Med Educ, Seattle, WA USA
[3] Ewha Womans Univ, Mokdong Hosp, Coll Med, Div Pulm & Crit Care Med,Dept Internal Med, Seoul, South Korea
[4] Univ Ulsan, Asan Med Ctr, Coll Med, Dept Nucl Med, Seoul, South Korea
[5] Yonsei Univ, Dept Digital Analyt, 50 Yonsei Ro, Seoul 03722, South Korea
[6] Univ Ulsan, Asan Med Ctr, Coll Med, Dept Oncol, Seoul, South Korea
[7] Univ Ulsan, Asan Med Ctr, Coll Med, Dept Informat Med, Seoul, South Korea
关键词
Natural language processing; Auto-annotation; Deep learning; Lung cancer; Pseudo-labelling;
D O I
10.1186/s12911-022-01975-7
中图分类号
R-058 [];
学科分类号
摘要
Background Extracting metastatic information from previous radiologic-text reports is important, however, laborious annotations have limited the usability of these texts. We developed a deep-learning model for extracting primary lung cancer sites and metastatic lymph nodes and distant metastasis information from PET-CT reports for determining lung cancer stages. Methods PET-CT reports, fully written in English, were acquired from two cohorts of patients with lung cancer who were diagnosed at a tertiary hospital between January 2004 and March 2020. One cohort of 20,466 PET-CT reports was used for training and the validation set, and the other cohort of 4190 PET-CT reports was used for an additional-test set. A pre-processing model (Lung Cancer Spell Checker) was applied to correct the typographical errors, and pseudo-labelling was used for training the model. The deep-learning model was constructed using the Convolutional-Recurrent Neural Network. The performance metrics for the prediction model were accuracy, precision, sensitivity, micro-AUROC, and AUPRC. Results For the extraction of primary lung cancer location, the model showed a micro-AUROC of 0.913 and 0.946 in the validation set and the additional-test set, respectively. For metastatic lymph nodes, the model showed a sensitivity of 0.827 and a specificity of 0.960. In predicting distant metastasis, the model showed a micro-AUROC of 0.944 and 0.950 in the validation and the additional-test set, respectively. Conclusion Our deep-learning method could be used for extracting lung cancer stage information from PET-CT reports and may facilitate lung cancer studies by alleviating laborious annotation by clinicians.
引用
收藏
页数:11
相关论文
共 41 条
  • [1] Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning
    Hyung Jun Park
    Namu Park
    Jang Ho Lee
    Myeong Geun Choi
    Jin-Sook Ryu
    Min Song
    Chang-Min Choi
    BMC Medical Informatics and Decision Making, 22
  • [2] Automatic Lung Cancer Staging from Medical Reports Using Natural Language Processing
    Sui, X.
    Liu, T.
    Huang, Q.
    Hou, Y.
    Wang, Y.
    Kang, G.
    Guo, H.
    Li, N.
    Li, Y.
    Wang, Z.
    Wang, J.
    JOURNAL OF THORACIC ONCOLOGY, 2018, 13 (10) : S772 - S772
  • [3] Automatic Extraction of Lung Cancer Staging Information From Computed Tomography Reports: Deep Learning Approach
    Hu, Danqing
    Zhang, Huanyao
    Li, Shaolei
    Wang, Yuhong
    Wu, Nan
    Lu, Xudong
    JMIR MEDICAL INFORMATICS, 2021, 9 (07)
  • [4] Discerning Tumor Status from Unstructured MRI Reports—Completeness of Information in Existing Reports and Utility of Automated Natural Language Processing
    Lionel T. E. Cheng
    Jiaping Zheng
    Guergana K. Savova
    Bradley J. Erickson
    Journal of Digital Imaging, 2010, 23 : 119 - 132
  • [5] Image Text Extraction and Natural Language Processing of Unstructured Data from Medical Reports
    Malashin, Ivan
    Masich, Igor
    Tynchenko, Vadim
    Gantimurov, Andrei
    Nelyub, Vladimir
    Borodulin, Aleksei
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (02): : 1361 - 1377
  • [6] Natural language processing in urology: Automated extraction of clinical information from histopathology reports of uro-oncology procedures
    Huang, Honghong
    Lim, Fiona Xin Yi
    Gu, Gary Tianyu
    Han, Matthew Jiangchou
    Fang, Andrew Hao Sen
    Chia, Elian Hui San
    Bei, Eileen Yen Tze
    Tham, Sarah Zhuling
    Ho, Henry Sun Sien
    Yuen, John Shyi Peng
    Sun, Aixin
    Lim, Jay Kheng Sit
    HELIYON, 2023, 9 (04)
  • [7] Discerning Tumor Status from Unstructured MRI Reports-Completeness of Information in Existing Reports and Utility of Automated Natural Language Processing
    Cheng, Lionel T. E.
    Zheng, Jiaping
    Savova, Guergana K.
    Erickson, Bradley J.
    JOURNAL OF DIGITAL IMAGING, 2010, 23 (02) : 119 - 132
  • [8] Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports
    Qiu, John X.
    Yoon, Hong-Jun
    Fearn, Paul A.
    Tourassi, Georgia D.
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2018, 22 (01) : 244 - 251
  • [9] Natural Language Processing Algorithm Used for Staging Pulmonary Oncology from Free-Text Radiological Reports: "Including PET-CT and Validation Towards Clinical Use"
    Nobel, J. Martijn
    Puts, Sander
    Krdzalic, Jasenko
    Zegers, Karen M. L.
    Lobbes, Marc B. I.
    Robben, Simon G. F.
    Dekker, Andre L. A. J.
    JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2024, 37 (01): : 3 - 12
  • [10] Uncertainty-aware automatic TNM staging classification for [18F] Fluorodeoxyglucose PET-CT reports for lung cancer utilising transformer-based language models and multi-task learning
    Barlow, Stephen H.
    Chicklore, Sugama
    He, Yulan
    Ourselin, Sebastien
    Wagner, Thomas
    Barnes, Anna
    Cook, Gary J. R.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)