Hybrid medical named entity recognition using document structure and surrounding context

被引:4
|
作者
Landolsi, Mohamed Yassine [1 ]
Romdhane, Lotfi Ben [1 ]
Hlaoua, Lobna [1 ]
机构
[1] Univ Sousse, MARS Res Lab, SDM Res Grp, ISITCom,LR17ES05, Hammam Sousse, Tunisia
关键词
Medical text mining; Named entity recognition; Machine learning; Information extraction; Electronic medical records; Section identification;
D O I
10.1007/s11227-023-05647-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, there is a huge amount of electronic medical documents created in natural language by medical specialists, containing useful information needed for several medical tasks. However, reading these documents to get some specific information is a too tiring task. Thus, extracting information automatically became an essential and a challenging task, especially Named Entity Recognition (NER). NER is crucial for extracting valuable information used in various medical tasks such as clinical decision support and drug safety surveillance. Capturing sufficient context is important for an efficient NER. In the literature, some important context information are not well exploited. Usually, a standard sequence segmentation is used, such as sentence segmentation, which may can't cover sufficient context. In this paper, we propose a supervised NER method, called MedSINE (Medical Section Identification to enhance the Named Entity tagging), which is based on sequence tagging task using Bidirectional Long Short-Term Memory neural network with Conditional Random Field (BiLSTM-CRF). For that, we exploit layout information to segment the text on chunk sequences and to extract the parent sections of each word as features to provide sufficient context. In addition, we have used a clinical Bidirectional Encoder Representations from Transformers (BERT) word embedding, Part of Speech (PoS), and entity surrounding sequence features. Experiments were conducted on a manually annotated dataset of real Summary of Product Characteristics (SmPC) medical documents in PDF format and on the Colorado Richly Annotated Full Text (CRAFT) corpus. Our model achieved an F1-measure of 89.49% and 73.52% in terms of strict matching evaluation using the SmPC and CRAFT datasets, respectively. The results show that employing the sequence of parent sections improves the F1-measure by 4.71% in terms of strict matching evaluation.
引用
收藏
页码:5011 / 5041
页数:31
相关论文
共 50 条
  • [41] From local to global: Leveraging document graph for named entity recognition
    Shang, Yu-Ming
    Mao, Hongli
    Tian, Tian
    Huang, Heyan
    Mao, Xian-Ling
    KNOWLEDGE-BASED SYSTEMS, 2025, 312
  • [42] NEREA: Named Entity Recognition and Disambiguation Exploiting Local Document Repositories
    Garrido, Angel L.
    Ilarri, Sergio
    Sangiao, Susana
    Ganan, Adrian
    Bean, Alejandro
    Cardiel, Oscar
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 1035 - 1042
  • [43] Telugu named entity recognition using bert
    Gorla, SaiKiranmai
    Tangeda, Sai Sharan
    Neti, Lalita Bhanu Murthy
    Malapati, Aruna
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2022, 14 (02) : 127 - 140
  • [44] Applications of Named Entity Recognition Using Graph Convolution Network
    Madan M.
    Rani A.
    Bhateja N.
    SN Computer Science, 4 (3)
  • [45] Named Entity Recognition in Portuguese Neurology Text Using CRF
    Lopes, Fabio
    Teixeira, Cesar
    Oliveira, Hugo Goncalo
    PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2019, PT I, 2019, 11804 : 336 - 348
  • [46] Named Entity Recognition in Crime Using Machine Learning Approach
    Shabat, Hafedh
    Omar, Nazlia
    Rahem, Khmael
    INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2014, 2014, 8870 : 280 - 288
  • [47] Telugu named entity recognition using bert
    SaiKiranmai Gorla
    Sai Sharan Tangeda
    Lalita Bhanu Murthy Neti
    Aruna Malapati
    International Journal of Data Science and Analytics, 2022, 14 : 127 - 140
  • [48] Chinese named entity recognition using support vector machines
    Lin, Xu-Dong
    Peng, Hong
    Liu, Bo
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 4216 - +
  • [49] COMCARE: A Collaborative Ensemble Framework for Context-Aware Medical Named Entity Recognition and Relation Extraction
    Jin, Myeong
    Choi, Sang-Min
    Kim, Gun-Woo
    ELECTRONICS, 2025, 14 (02):
  • [50] Named entity recognition in crime using machine learning approach
    Shabat, Hafedh (h2005_ali@yahoo.com), 1600, Springer Verlag (8870): : 280 - 288