A unified framework of medical information annotation and extraction for Chinese clinical text

被引:2
作者
Zhu, Enwei [1 ,2 ]
Sheng, Qilin [1 ]
Yang, Huanwan [1 ]
Liu, Yiyang [1 ,2 ]
Cai, Ting [1 ,2 ]
Li, Jinpeng [1 ,2 ]
机构
[1] Ningbo 2 Hosp, Ningbo 315010, Zhejiang, Peoples R China
[2] Univ Chinese Acad Sci, Ningbo Inst Life & Hlth Ind, Ningbo 315016, Zhejiang, Peoples R China
关键词
Information extraction; Annotation scheme; Electronic medical record; Chinese clinical text; NEURAL-NETWORKS; CORPUS;
D O I
10.1016/j.artmed.2023.102573
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical information extraction consists of a group of natural language processing (NLP) tasks, which collaboratively convert clinical text to pre-defined structured formats. This is a critical step to exploit electronic medical records (EMRs). Given the recent thriving NLP technologies, model implementation and performance seem no longer an obstacle, whereas the bottleneck locates on a high-quality annotated corpus and the whole engineering workflow. This study presents an engineering framework consisting of three tasks, i.e., medical entity recognition, relation extraction and attribute extraction. Within this framework, the whole workflow is demonstrated from EMR data collection through model performance evaluation. Our annotation scheme is designed to be comprehensive and compatible between the multiple tasks. With the EMRs from a general hospital in Ningbo, China, and the manual annotation by experienced physicians, our corpus is of large scale and high quality. Built upon this Chinese clinical corpus, the medical information extraction system show performance that approaches human annotation. The annotation scheme, (a subset of) the annotated corpus, and the code are all publicly released, to facilitate further research.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Relation Extraction From Biomedical and Clinical Text: Unified Multitask Learning Framework
    Yadav, Shweta
    Ramesh, Srivastsa
    Saha, Sriparna
    Ekbal, Asif
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (02) : 1105 - 1116
  • [2] An Automated Approach for Clinical Quantitative Information Extraction from Chinese Electronic Medical Records
    Liu, Shanshan
    Pan, Xiaoyi
    Chen, Boyu
    Gao, Dongfa
    Hao, Tianyong
    HEALTH INFORMATION SCIENCE (HIS 2018), 2018, 11148 : 98 - 109
  • [3] Information Extraction Models for German Clinical Text
    Roller, Roland
    Seiffe, Laura
    Ayach, Ammer
    Moller, Sebastian
    Marten, Oliver
    Mikhailov, Michael
    Alt, Christoph
    Schmidt, Danilo
    Halleck, Fabian
    Naik, Marcel
    Duettmann, Wiebke
    Budde, Klemens
    2020 8TH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2020), 2020, : 527 - 528
  • [4] WEIGHT ANNOTATION IN INFORMATION EXTRACTION
    Doleschal, Johannes
    Kimelfeld, Benny
    Martens, Wim
    Peterfreund, Liat
    LOGICAL METHODS IN COMPUTER SCIENCE, 2020, 18 (01)
  • [5] MIDAS: An Information-Extraction Approach to Medical Text Classification
    Sotelsek-Margalef, Anastasia
    Villena-Roman, Julio
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 97 - 104
  • [6] Deep Learning in Chinese Text Information Extraction Model for Coastal Biodiversity
    Wang, Xiujuan
    Li, Xuerong
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2023, 19 (01)
  • [7] Information extraction for Chinese free text based on pattern match combine with heuristic information
    Yu, Y
    Wang, XL
    Guan, Y
    2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 214 - 218
  • [8] A general framework for subjective information extraction from unstructured English text
    Mangassarian, Hratch
    Artail, Hassan
    DATA & KNOWLEDGE ENGINEERING, 2007, 62 (02) : 352 - 367
  • [9] Text Preprocessing and Annotation Tool for Time Information
    Lim, Chae-Gyun
    Jeong, Young-Seob
    Kim, Woo-Jin
    Kim, Youngjin
    Choi, Ho-Jin
    2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 351 - 352
  • [10] Annotation projection for temporal information extraction
    Giannella, Chris R.
    Winder, Ransom K.
    Jubinski, Joseph P.
    NATURAL LANGUAGE ENGINEERING, 2019, 25 (03) : 385 - 403