A unified framework of medical information annotation and extraction for Chinese clinical text

被引:7
作者
Zhu, Enwei [1 ,2 ]
Sheng, Qilin [1 ]
Yang, Huanwan [1 ]
Liu, Yiyang [1 ,2 ]
Cai, Ting [1 ,2 ]
Li, Jinpeng [1 ,2 ]
机构
[1] Ningbo 2 Hosp, Ningbo 315010, Zhejiang, Peoples R China
[2] Univ Chinese Acad Sci, Ningbo Inst Life & Hlth Ind, Ningbo 315016, Zhejiang, Peoples R China
关键词
Information extraction; Annotation scheme; Electronic medical record; Chinese clinical text; NEURAL-NETWORKS; CORPUS;
D O I
10.1016/j.artmed.2023.102573
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical information extraction consists of a group of natural language processing (NLP) tasks, which collaboratively convert clinical text to pre-defined structured formats. This is a critical step to exploit electronic medical records (EMRs). Given the recent thriving NLP technologies, model implementation and performance seem no longer an obstacle, whereas the bottleneck locates on a high-quality annotated corpus and the whole engineering workflow. This study presents an engineering framework consisting of three tasks, i.e., medical entity recognition, relation extraction and attribute extraction. Within this framework, the whole workflow is demonstrated from EMR data collection through model performance evaluation. Our annotation scheme is designed to be comprehensive and compatible between the multiple tasks. With the EMRs from a general hospital in Ningbo, China, and the manual annotation by experienced physicians, our corpus is of large scale and high quality. Built upon this Chinese clinical corpus, the medical information extraction system show performance that approaches human annotation. The annotation scheme, (a subset of) the annotated corpus, and the code are all publicly released, to facilitate further research.
引用
收藏
页数:12
相关论文
共 50 条
[11]   Annotation projection for temporal information extraction [J].
Giannella, Chris R. ;
Winder, Ransom K. ;
Jubinski, Joseph P. .
NATURAL LANGUAGE ENGINEERING, 2019, 25 (03) :385-403
[12]   Use of "off-the-shelf" information extraction algorithms in clinical informatics: A feasibility study of MetaMap annotation of Italian medical notes [J].
Chiaramello, Emma ;
Pinciroli, Francesco ;
Bonalumi, Alberico ;
Caroli, Angelo ;
Tognola, Gabriella .
JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 63 :22-32
[13]   Empowering LLMs for Long-Text Information Extraction in Chinese Legal Documents [J].
Shen, Chenchen ;
Ji, Chengwei ;
Yue, Shengbin ;
Shen, Xiaoyu ;
Song, Yun ;
Huang, Xuanjing ;
Wei, Zhongyu .
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, NLPCC 2024, 2025, 15359 :457-469
[14]   Development of a Systematic Text Annotation Standard to Extract Social Support Information form Electronic Medical Records [J].
Volij, Camila ;
Esteban, Santiago .
DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 :1261-1262
[15]   MAT: Marker-Lattice Transformer for Entity, Relation and Attribute Extraction From Chinese Clinical Text [J].
Wang, Sheng ;
Zhu, Enwei ;
Zhao, Fangyuan ;
Bu, Dechao ;
Li, Jinpeng ;
Zhao, Yi .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2025, 33 :2759-2774
[16]   Integrating shortest dependency path and sentence sequence into a deep learning framework for relation extraction in clinical text [J].
Li, Zhiheng ;
Yang, Zhihao ;
Shen, Chen ;
Xu, Jun ;
Zhang, Yaoyun ;
Xu, Hua .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (Suppl 1)
[17]   A Refinement System for Medical Information Extraction from Text-based Bilingual Electronic Medical Records [J].
Bae, Inho ;
Kim, Jin-Sang .
HEALTHCARE INFORMATICS RESEARCH, 2008, 14 (03) :267-274
[18]   A novel text mining approach for scholar information extraction from web content in Chinese [J].
Xie, Xia ;
Fu, Yu ;
Jin, Hai ;
Zhao, Yaliang ;
Cao, Wenzhi .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 111 :859-872
[19]   UniEX: An Effective and Efficient Framework for Unified Information Extraction via a Span-extractive Perspective [J].
Yang, Ping ;
Lu, Junyu ;
Gan, Ruyi ;
Wang, Junjie ;
Zhang, Yuxiang ;
Zhang, Jiaxing ;
Zhang, Pingjian .
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, :16424-16440
[20]   Taming Big Data: An Information Extraction Strategy for Large Clinical Text Corpora [J].
Gundlapalli, Adi V. ;
Divita, Guy ;
Carter, Marjorie E. ;
Redd, Andrew ;
Samore, Matthew H. ;
Gupta, Kalpana ;
Trautner, Barbara .
ENABLING HEALTH INFORMATICS APPLICATIONS, 2015, 213 :175-178