Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning

被引:7
|
作者
Park, Hyung Jun [1 ,7 ]
Park, Namu [2 ]
Lee, Jang Ho [1 ]
Choi, Myeong Geun [3 ]
Ryu, Jin-Sook [4 ]
Song, Min [5 ]
Choi, Chang-Min [1 ,6 ]
机构
[1] Univ Ulsan, Coll Med, Asan Med Ctr, Dept Pulm & Crit Care Med, 88,Olymp Ro 43 Gil, Seoul 05505, South Korea
[2] Univ Washington, Sch Med, Dept Biomed Informat & Med Educ, Seattle, WA USA
[3] Ewha Womans Univ, Mokdong Hosp, Coll Med, Div Pulm & Crit Care Med,Dept Internal Med, Seoul, South Korea
[4] Univ Ulsan, Asan Med Ctr, Coll Med, Dept Nucl Med, Seoul, South Korea
[5] Yonsei Univ, Dept Digital Analyt, 50 Yonsei Ro, Seoul 03722, South Korea
[6] Univ Ulsan, Asan Med Ctr, Coll Med, Dept Oncol, Seoul, South Korea
[7] Univ Ulsan, Asan Med Ctr, Coll Med, Dept Informat Med, Seoul, South Korea
关键词
Natural language processing; Auto-annotation; Deep learning; Lung cancer; Pseudo-labelling;
D O I
10.1186/s12911-022-01975-7
中图分类号
R-058 [];
学科分类号
摘要
Background Extracting metastatic information from previous radiologic-text reports is important, however, laborious annotations have limited the usability of these texts. We developed a deep-learning model for extracting primary lung cancer sites and metastatic lymph nodes and distant metastasis information from PET-CT reports for determining lung cancer stages. Methods PET-CT reports, fully written in English, were acquired from two cohorts of patients with lung cancer who were diagnosed at a tertiary hospital between January 2004 and March 2020. One cohort of 20,466 PET-CT reports was used for training and the validation set, and the other cohort of 4190 PET-CT reports was used for an additional-test set. A pre-processing model (Lung Cancer Spell Checker) was applied to correct the typographical errors, and pseudo-labelling was used for training the model. The deep-learning model was constructed using the Convolutional-Recurrent Neural Network. The performance metrics for the prediction model were accuracy, precision, sensitivity, micro-AUROC, and AUPRC. Results For the extraction of primary lung cancer location, the model showed a micro-AUROC of 0.913 and 0.946 in the validation set and the additional-test set, respectively. For metastatic lymph nodes, the model showed a sensitivity of 0.827 and a specificity of 0.960. In predicting distant metastasis, the model showed a micro-AUROC of 0.944 and 0.950 in the validation and the additional-test set, respectively. Conclusion Our deep-learning method could be used for extracting lung cancer stage information from PET-CT reports and may facilitate lung cancer studies by alleviating laborious annotation by clinicians.
引用
收藏
页数:11
相关论文
共 41 条
  • [41] Deep learning-based image analysis predicts PD-L1 status from 18F-FDG PET/CT images in non-small-cell lung cancer
    Liang, Chen
    Zheng, Meiyu
    Zou, Han
    Han, Yu
    Zhan, Yingying
    Xing, Yu
    Liu, Chang
    Zuo, Chao
    Zou, Jinhai
    FRONTIERS IN ONCOLOGY, 2024, 14