COMCARE: A Collaborative Ensemble Framework for Context-Aware Medical Named Entity Recognition and Relation Extraction

被引:0
作者
Jin, Myeong [1 ]
Choi, Sang-Min [2 ,3 ]
Kim, Gun-Woo [2 ]
机构
[1] Gyeongsang Natl Univ, Dept AI Convergence Engn, Jinju 52828, South Korea
[2] Gyeongsang Natl Univ, Dept Comp Sci & Engn, Jinju 52828, South Korea
[3] Gyeongsang Natl Univ, Res Inst Nat Sci, Jinju 52828, South Korea
基金
新加坡国家研究基金会;
关键词
named entity recognition; relation extraction; medical natural language processing; pre-trained language models; ensemble learning; CORPUS;
D O I
10.3390/electronics14020328
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid expansion of medical information has resulted in named entity recognition (NER) and relation extraction (RE) essential for clinical decision support systems. Medical texts often contain specialized vocabulary, ambiguous abbreviations, synonyms, polysemous terms, and overlapping entities, which introduce significant challenges to the extraction process. Existing approaches, which typically rely on single models such as BiLSTM or BERT, often struggle with these complexities. Although large language models (LLMs) have shown promise in various NLP tasks, they still face limitations in handling token-level tasks critical for medical NER and RE. To address these challenges, we propose COMCARE, a collaborative ensemble framework for context-aware medical NER and RE that integrates multiple pre-trained language models through a collaborative decision strategy. For NER, we combined PubMedBERT and PubMed-T5, leveraging PubMedBERT's contextual understanding and PubMed-T5's generative capabilities to handle diverse forms of medical terminology, from standard domain-specific jargon to nonstandard representations, such as uncommon abbreviations and out-of-vocabulary (OOV) terms. For RE, we integrated general-domain BERT with biomedical-specific BERT and PubMed-T5, utilizing token-level information from the NER module to enhance the context-aware entity-based relation extraction. To effectively handle long-range dependencies and maintain consistent performance across diverse texts, we implemented a semantic chunking approach and combined the model outputs through a majority voting mechanism. We evaluated COMCARE on several biomedical datasets, including BioRED, ADE, RDD, and DIANN Corpus. For BioRED, COMCARE achieved F1 scores of 93.76% for NER and 68.73% for RE, outperforming BioBERT by 1.25% and 1.74%, respectively. On the RDD Corpus, COMCARE showed F1 scores of 77.86% for NER and 86.79% for RE while achieving 82.48% for NER on ADE and 99.36% for NER on DIANN. These results demonstrate the effectiveness of our approach in handling complex medical terminology and overlapping entities, highlighting its potential to improve clinical decision support systems.
引用
收藏
页数:32
相关论文
共 43 条
[1]   Optuna: A Next-generation Hyperparameter Optimization Framework [J].
Akiba, Takuya ;
Sano, Shotaro ;
Yanase, Toshihiko ;
Ohta, Takeru ;
Koyama, Masanori .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :2623-2631
[2]  
[Anonymous], 2024, ChatGPT (3.5) [Large language model]
[3]   Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning [J].
Arbabi, Aryan ;
Adams, David R. ;
Fidler, Sanja ;
Brudno, Michael .
JMIR MEDICAL INFORMATICS, 2019, 7 (02) :191-205
[4]  
Chaudhry M., 2022, P 19 INT C NAT LANG, P116
[5]  
Da Silva Diego Pinheiro, 2023, Informatics in Medicine Unlocked, V43, DOI 10.1016/j.imu.2023.101381
[6]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]  
Dewi IN, 2017, IEEE INT C BIOINFORM, P1795, DOI 10.1109/BIBM.2017.8217933
[8]  
Elgaar Mohamed, 2024, MedDec: A Dataset for Extracting Medical Decisions from Discharge Summaries
[9]  
Fabregat H., 2018, P IBEREVAL SEPLN SEV
[10]   Negation-based transfer learning for improving biomedical Named Entity Recognition and Relation Extraction [J].
Fabregat, Hermenegildo ;
Duque, Andres ;
Martinez-Romo, Juan ;
Araujo, Lourdes .
JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 138