Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning

被引:7
|
作者
Park, Hyung Jun [1 ,7 ]
Park, Namu [2 ]
Lee, Jang Ho [1 ]
Choi, Myeong Geun [3 ]
Ryu, Jin-Sook [4 ]
Song, Min [5 ]
Choi, Chang-Min [1 ,6 ]
机构
[1] Univ Ulsan, Coll Med, Asan Med Ctr, Dept Pulm & Crit Care Med, 88,Olymp Ro 43 Gil, Seoul 05505, South Korea
[2] Univ Washington, Sch Med, Dept Biomed Informat & Med Educ, Seattle, WA USA
[3] Ewha Womans Univ, Mokdong Hosp, Coll Med, Div Pulm & Crit Care Med,Dept Internal Med, Seoul, South Korea
[4] Univ Ulsan, Asan Med Ctr, Coll Med, Dept Nucl Med, Seoul, South Korea
[5] Yonsei Univ, Dept Digital Analyt, 50 Yonsei Ro, Seoul 03722, South Korea
[6] Univ Ulsan, Asan Med Ctr, Coll Med, Dept Oncol, Seoul, South Korea
[7] Univ Ulsan, Asan Med Ctr, Coll Med, Dept Informat Med, Seoul, South Korea
关键词
Natural language processing; Auto-annotation; Deep learning; Lung cancer; Pseudo-labelling;
D O I
10.1186/s12911-022-01975-7
中图分类号
R-058 [];
学科分类号
摘要
Background Extracting metastatic information from previous radiologic-text reports is important, however, laborious annotations have limited the usability of these texts. We developed a deep-learning model for extracting primary lung cancer sites and metastatic lymph nodes and distant metastasis information from PET-CT reports for determining lung cancer stages. Methods PET-CT reports, fully written in English, were acquired from two cohorts of patients with lung cancer who were diagnosed at a tertiary hospital between January 2004 and March 2020. One cohort of 20,466 PET-CT reports was used for training and the validation set, and the other cohort of 4190 PET-CT reports was used for an additional-test set. A pre-processing model (Lung Cancer Spell Checker) was applied to correct the typographical errors, and pseudo-labelling was used for training the model. The deep-learning model was constructed using the Convolutional-Recurrent Neural Network. The performance metrics for the prediction model were accuracy, precision, sensitivity, micro-AUROC, and AUPRC. Results For the extraction of primary lung cancer location, the model showed a micro-AUROC of 0.913 and 0.946 in the validation set and the additional-test set, respectively. For metastatic lymph nodes, the model showed a sensitivity of 0.827 and a specificity of 0.960. In predicting distant metastasis, the model showed a micro-AUROC of 0.944 and 0.950 in the validation and the additional-test set, respectively. Conclusion Our deep-learning method could be used for extracting lung cancer stage information from PET-CT reports and may facilitate lung cancer studies by alleviating laborious annotation by clinicians.
引用
收藏
页数:11
相关论文
共 41 条
  • [21] Anatomic stage extraction from medical reports of breast Cancer patients using natural language processing
    Deshmukh, Pratiksha R.
    Phalnikar, Rashmi
    HEALTH AND TECHNOLOGY, 2020, 10 (06) : 1555 - 1570
  • [22] Anatomic stage extraction from medical reports of breast Cancer patients using natural language processing
    Pratiksha R. Deshmukh
    Rashmi Phalnikar
    Health and Technology, 2020, 10 : 1555 - 1570
  • [23] A Combined Manual Annotation and Deep-Learning Natural Language Processing Study on Accurate Entity Extraction in Hereditary Disease Related Biomedical Literature
    Huang, Dao-Ling
    Zeng, Quanlei
    Xiong, Yun
    Liu, Shuixia
    Pang, Chaoqun
    Xia, Menglei
    Fang, Ting
    Ma, Yanli
    Qiang, Cuicui
    Zhang, Yi
    Zhang, Yu
    Li, Hong
    Yuan, Yuying
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2024, 16 (02) : 333 - 344
  • [24] Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity
    Park, Briton
    Altieri, Nicholas
    DeNero, John
    Odisho, Anobel Y.
    Yu, Bin
    JAMIA OPEN, 2021, 4 (03)
  • [25] Feasibility of perfusion CT technique integrated into conventional 18FDG/PET-CT studies in lung cancer patients: clinical staging and functional information in a single study
    Ippolito, Davide
    Capraro, Cristina
    Guerra, Luca
    De Ponti, Elena
    Messa, Cristina
    Sironi, Sandro
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2013, 40 (02) : 156 - 165
  • [26] Feasibility of perfusion CT technique integrated into conventional 18FDG/PET-CT studies in lung cancer patients: clinical staging and functional information in a single study
    Davide Ippolito
    Cristina Capraro
    Luca Guerra
    Elena De Ponti
    Cristina Messa
    Sandro Sironi
    European Journal of Nuclear Medicine and Molecular Imaging, 2013, 40 : 156 - 165
  • [27] Diagnostic performance of a deep-learning model using 18F-FDG PET/CT for evaluating recurrence after radiation therapy in patients with lung cancer
    Sung, Changhwan
    Oh, Jungsu S.
    Park, Byung Soo
    Kim, Su Ssan
    Song, Si Yeol
    Lee, Jong Jin
    ANNALS OF NUCLEAR MEDICINE, 2024, 38 (07) : 516 - 524
  • [28] Identifying Patient Populations in Texts Describing Drug Approvals Through Deep Learning-Based Information Extraction: Development of a Natural Language Processing Algorithm
    Gendrin, Aline
    Souliotis, Leonidas
    Loudon-Griffiths, James
    Aggarwal, Ravisha
    Amoako, Daniel
    Desouza, Gregory
    Dimitrievska, Sashka
    Metcalfe, Paul
    Louvet, Emilie
    Sahni, Harpreet
    JMIR FORMATIVE RESEARCH, 2023, 7
  • [29] Lung Cancer Staging Using Chest CT and FDG PET/CT Free-Text Reports: Comparison Among Three ChatGPT Large Language Models and Six Human Readers of Varying Experience
    Lee, Jong Eun
    Park, Ki-Seong
    Kim, Yun-Hyeon
    Song, Ho-Chun
    Park, Byunggeon
    Jeong, Yeon Joo
    AMERICAN JOURNAL OF ROENTGENOLOGY, 2024, 223 (06)
  • [30] Natural-Language Processing (NLP) based feature extraction technique in Deep-Learning model to predict the Blood-Brain-Barrier permeability of molecules
    Singh, Ravi
    Ghosh, Powsali
    Ganeshpurkar, Ankit
    Anand, Asha
    Swetha, Rayala
    Singh, Ravi Bhushan
    Kumar, Dileep
    Singh, Sushil Kumar
    Kumar, Ashok
    MOLECULAR INFORMATICS, 2023, 42 (10)