Hybrid medical named entity recognition using document structure and surrounding context

被引：4

作者：

Landolsi, Mohamed Yassine ^{[1
]}

Romdhane, Lotfi Ben ^{[1
]}

Hlaoua, Lobna ^{[1
]}

机构：

[1] Univ Sousse, MARS Res Lab, SDM Res Grp, ISITCom,LR17ES05, Hammam Sousse, Tunisia

来源：

JOURNAL OF SUPERCOMPUTING | 2024年 / 80卷 / 04期

关键词：

Medical text mining; Named entity recognition; Machine learning; Information extraction; Electronic medical records; Section identification;

D O I：

10.1007/s11227-023-05647-9

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Nowadays, there is a huge amount of electronic medical documents created in natural language by medical specialists, containing useful information needed for several medical tasks. However, reading these documents to get some specific information is a too tiring task. Thus, extracting information automatically became an essential and a challenging task, especially Named Entity Recognition (NER). NER is crucial for extracting valuable information used in various medical tasks such as clinical decision support and drug safety surveillance. Capturing sufficient context is important for an efficient NER. In the literature, some important context information are not well exploited. Usually, a standard sequence segmentation is used, such as sentence segmentation, which may can't cover sufficient context. In this paper, we propose a supervised NER method, called MedSINE (Medical Section Identification to enhance the Named Entity tagging), which is based on sequence tagging task using Bidirectional Long Short-Term Memory neural network with Conditional Random Field (BiLSTM-CRF). For that, we exploit layout information to segment the text on chunk sequences and to extract the parent sections of each word as features to provide sufficient context. In addition, we have used a clinical Bidirectional Encoder Representations from Transformers (BERT) word embedding, Part of Speech (PoS), and entity surrounding sequence features. Experiments were conducted on a manually annotated dataset of real Summary of Product Characteristics (SmPC) medical documents in PDF format and on the Colorado Richly Annotated Full Text (CRAFT) corpus. Our model achieved an F1-measure of 89.49% and 73.52% in terms of strict matching evaluation using the SmPC and CRAFT datasets, respectively. The results show that employing the sequence of parent sections improves the F1-measure by 4.71% in terms of strict matching evaluation.

引用

页码：5011 / 5041

页数：31

共 50 条

[21] Medical Named Entity Recognition Using Weakly Supervised Learning
Ma, Long-Long
Yang, Jie
An, Bo
Liu, Shuaikang
Huang, Gaijuan
COGNITIVE COMPUTATION, 2022, 14 (03) : 1068 - 1079
[22] Named Entity Recognition and Classification for Medical Prospectuses
Chirila, Oana Sorina
Chirila, Ciprian-Bogdan
Stoicu-Tivadar, Lacramioara
HEALTH INFORMATICS VISION: FROM DATA VIA INFORMATION TO KNOWLEDGE, 2019, 262 : 284 - 287
[23] A hybrid model for Chinese named entity recognition
Sun, Xiao
Huang, Degen
RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 232 - 237
[24] A Hybrid Approach for Persian Named Entity Recognition
Hamed Moradi
Farid Ahmadi
Mohammad-Reza Feizi-Derakhshi
Iranian Journal of Science and Technology, Transactions A: Science, 2017, 41 : 215 - 222
[25] Named Entity Recognition in Manipuri: A Hybrid Approach
Jimmy, L.
Kaur, Darvinder
LANGUAGE PROCESSING AND KNOWLEDGE IN THE WEB, 2013, 8105 : 104 - 110
[26] Metabolite Named Entity Recognition: A Hybrid Approach
Kongburan, Wutthipong
Padungweang, Praisan
Krathu, Worarat
Chan, Jonathan H.
NEURAL INFORMATION PROCESSING, ICONIP 2016, PT I, 2016, 9947 : 451 - 460
[27] Named Entity Recognition in Online Medical Consultation Using Deep Learning
Hu, Ze
Li, Wenjun
Yang, Hongyu
APPLIED SCIENCES-BASEL, 2025, 15 (06):
[28] Named Entity Recognition from Unstructured Handwritten Document Images
Adak, Chandranath
Chaudhuri, Bidyut B.
Blumenstein, Michael
PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 375 - 380
[29] Bidirectional LSTM with a Context Input Window for Named Entity Recognition in Tweets
Peres, Rafael
Esteves, Diego
Maheshwari, Gaurav
K-CAP 2017: PROCEEDINGS OF THE KNOWLEDGE CAPTURE CONFERENCE, 2017,
[30] Enhanced Named Entity Recognition algorithm for financial document verification
Ahmet Toprak
Metin Turan
The Journal of Supercomputing, 2023, 79 : 19431 - 19451

← 1 2 3 4 5 →