Event-Based Clinical Finding Extraction from Radiology Reports with Pre-trained Language Model

被引:0
|
作者
Wilson Lau
Kevin Lybarger
Martin L. Gunn
Meliha Yetisgen
机构
[1] University of Washington,Biomedical & Health Informatics, School of Medicine
[2] University of Washington,Department of Radiology, School of Medicine
来源
Journal of Digital Imaging | 2023年 / 36卷
关键词
Natural language processing; Information extraction; Event extraction; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Radiology reports contain a diverse and rich set of clinical abnormalities documented by radiologists during their interpretation of the images. Comprehensive semantic representations of radiological findings would enable a wide range of secondary use applications to support diagnosis, triage, outcomes prediction, and clinical research. In this paper, we present a new corpus of radiology reports annotated with clinical findings. Our annotation schema captures detailed representations of pathologic findings that are observable on imaging (“lesions”) and other types of clinical problems (“medical problems”). The schema used an event-based representation to capture fine-grained details, including assertion, anatomy, characteristics, size, and count. Our gold standard corpus contained a total of 500 annotated computed tomography (CT) reports. We extracted triggers and argument entities using two state-of-the-art deep learning architectures, including BERT. We then predicted the linkages between trigger and argument entities (referred to as argument roles) using a BERT-based relation extraction model. We achieved the best extraction performance using a BERT model pre-trained on 3 million radiology reports from our institution: 90.9–93.4% F1 for finding triggers and 72.0–85.6% F1 for argument roles. To assess model generalizability, we used an external validation set randomly sampled from the MIMIC Chest X-ray (MIMIC-CXR) database. The extraction performance on this validation set was 95.6% for finding triggers and 79.1–89.7% for argument roles, demonstrating that the model generalized well to the cross-institutional data with a different imaging modality. We extracted the finding events from all the radiology reports in the MIMIC-CXR database and provided the extractions to the research community.
引用
收藏
页码:91 / 104
页数:13
相关论文
共 50 条
  • [21] Talent Supply and Demand Matching Based on Prompt Learning and the Pre-Trained Language Model
    Li, Kunping
    Liu, Jianhua
    Zhuang, Cunbo
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [22] An analysis on language transfer of pre-trained language model with cross-lingual post-training
    Son, Suhyune
    Park, Chanjun
    Lee, Jungseob
    Shim, Midan
    Lee, Chanhee
    Jang, Yoonna
    Seo, Jaehyung
    Lim, Jungwoo
    Lim, Heuiseok
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 267
  • [23] Pre-trained Model Based Feature Envy Detection
    Ma, Wenhao
    Yu, Yaoxiang
    Ruan, Xiaoming
    Cai, Bo
    2023 IEEE/ACM 20TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2023, : 430 - 440
  • [24] Automatic Fixation of Decompilation Quirks Using Pre-trained Language Model
    Kaichi, Ryunosuke
    Matsumoto, Shinsuke
    Kusumoto, Shinji
    PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROFES 2023, PT I, 2024, 14483 : 259 - 266
  • [25] Question-answering Forestry Pre-trained Language Model: ForestBERT
    Tan, Jingwei
    Zhang, Huaiqing
    Liu, Yang
    Yang, Jie
    Zheng, Dongping
    Linye Kexue/Scientia Silvae Sinicae, 2024, 60 (09): : 99 - 110
  • [26] Measuring semantic similarity of clinical trial outcomes using deep pre-trained language representations
    Koroleva A.
    Kamath S.
    Paroubek P.
    Journal of Biomedical Informatics: X, 2019, 4
  • [27] iProL: identifying DNA promoters from sequence information based on Longformer pre-trained model
    Peng, Binchao
    Sun, Guicong
    Fan, Yongxian
    BMC BIOINFORMATICS, 2024, 25 (01):
  • [28] Few-shot medical relation extraction via prompt tuning enhanced pre-trained language model
    He, Guoxiu
    Huang, Chen
    NEUROCOMPUTING, 2025, 633
  • [29] A Transformer Based Approach To Detect Suicidal Ideation Using Pre-Trained Language Models
    Haque, Farsheed
    Nur, Ragib Un
    Al Jahan, Shaeekh
    Mahmud, Zarar
    Shah, Faisal Muhammad
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
  • [30] Predictive Recognition of DNA-binding Proteins Based on Pre-trained Language Model BERT
    Ma, Yue
    Pei, Yongzhen
    Li, Changguo
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2023, 21 (06)